What is generative AI in pharma?

Generative AI in pharma refers to models that create novel outputs — new small-molecule structures, never-before-seen protein and antibody sequences, candidate targets, or drafted regulatory documents — rather than merely scoring or classifying existing data. It contrasts with predictive AI, which ranks or filters known options. In practice the highest-value uses today are de novo molecule design, generative protein/antibody design, and AI-assisted authoring of trial and regulatory documents.

Does generative AI actually work in drug discovery?

For molecule generation, yes — with caveats. The clearest proof is Insilico Medicine’s rentosertib, an AI-discovered and AI-designed molecule that posted a positive Phase IIa published in Nature Medicine in 2025. Generative protein-design tools (the AlphaFold and David Baker lineages) have also produced functional, lab-validated proteins. What remains unproven is whether generative design raises the overall probability of a drug reaching approval, versus simply generating candidates faster.

How does the FDA regulate AI in drug development?

In January 2025 the FDA issued its first draft guidance, “Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products,” built around a 7-step credibility-assessment framework keyed to model risk and context of use. Importantly, it covers AI used to support regulatory decisions in nonclinical, clinical, post-marketing, and manufacturing phases — but explicitly does not govern AI used purely in drug discovery. So a generatively-designed molecule is not itself regulated differently; the scrutiny attaches to how AI informs filings.

Who owns a molecule designed by generative AI?

This is an unsettled and deal-critical question. Patent offices generally require a human inventor, so AI-generated inventions raise inventorship and validity risk if not carefully documented with human contribution. For dealmakers, the practical response is diligence: confirm clear human-in-the-loop records, clean data provenance, and that freedom-to-operate and patent claims do not hinge on AI being named as inventor.

Generative AI in Pharma: What Dealmakers Need to Know

Generative vs Predictive AI

The single most useful distinction in this space is also the one most often blurred in marketing. Predictive AI takes existing options and scores them — which of these compounds is likely toxic, which patients are likely to respond. It has been quietly useful in pharma for years. Generative AI does something different and newer: it creates outputs that did not exist before — a novel small-molecule structure, an entirely new protein or antibody sequence, a drafted clinical protocol.

The distinction matters commercially because generation is what can produce novel, ownable assets. A better toxicity filter improves efficiency; a newly designed molecule can become patentable intellectual property. In practice the leading platforms chain the two — generate a wide space of candidates, then use predictive models to filter to the few worth synthesizing.

Where dealmakers go wrong is taking a platform’s “generative AI” branding at face value. Plenty of tools marketed as generative are, under the hood, mostly predictive — valuable, but not a source of defensible IP. The diligence question is simple and clarifying: does this platform create a novel composition of matter that can be owned, or does it optimize the search over molecules that already exist? The answer determines whether you are buying an asset engine or an efficiency tool — and they are worth very different multiples.

Generative vs predictive AI in pharma

Dimension	Generative	Predictive
What it does Generation expands the search space; prediction narrows it.	Creates new molecules/proteins/text	Scores, ranks, or classifies existing options
Typical use Most platforms use both — generate, then predict/filter.	De novo design, protein design, document drafting	Toxicity prediction, hit triage, patient stratification
Key risk Generative outputs must be experimentally validated, always.	Plausible but invalid outputs (“hallucination”)	Bias and brittleness on out-of-distribution data
Deal signal Generation can create defensible assets; prediction rarely does alone.	Novel IP, but inventorship questions	Efficiency gains, harder to defend as moat

Where It Actually Works

Four uses have moved from demo to genuine value:

De novo small-molecule design. Generative chemistry models propose novel structures against a target, which are then synthesized and tested. This is the core of platforms like Insilico, Iambic, and Genesis.
Generative protein and antibody design. The AlphaFold lineage (structure prediction) and the David Baker lineage (de novo protein design, which seeded Xaira) can now generate functional, lab-validated proteins and binders — expanding the design space for biologics.
Target and hypothesis generation. Models propose novel disease targets from multi-omic data — the step that produced the TNIK target behind Insilico’s rentosertib.
Regulatory and trial document drafting. Large language models draft protocols, CSRs, and submission documents — the least glamorous but fastest-adopted use, and the one most directly touched by FDA guidance (below).

The Generative Toolbox

“Generative AI” is an umbrella over several distinct techniques, and the distinctions matter when you diligence a platform — each has a different failure mode:

Diffusion models for molecules and proteins. The same family of models behind AI image generation has been adapted to “denoise” novel chemical structures and protein backbones into existence. The David Baker lab’s diffusion-based protein design (which seeded Xaira) is the headline example.
Protein language models. Trained on the universe of known protein sequences, these models generate plausible new sequences with desired properties — the basis of much antibody and enzyme design.
Structure prediction as scaffolding. AlphaFold and its successors are not generative per se, but they provide the structural ground truth that generative design builds on — predict the fold, then design against it.
Large language models for chemistry and documents. LLMs both propose molecules in text-based chemical representations and draft the regulatory and trial documents that surround a program.

The practical point: a platform’s technique determines what it is good at and where it breaks. A protein-design shop and a small-molecule generative-chemistry shop are not interchangeable, and neither is automatically credible at the other’s job.

The Proof — and Its Limits

For years generative AI in pharma was a promise without a clinical receipt. That changed in 2025: Insilico Medicine’s rentosertib — a TNIK inhibitor for idiopathic pulmonary fibrosis whose target was nominated by AI and whose molecule was generated by AI — posted a positive Phase IIa, published in Nature Medicine. On the biologics side, generative protein-design tools have produced functional binders validated in the lab.

What is — and isn’t — proven

Generative AI has proven it can design molecules and proteins that are active in the real world. It has not proven that it raises the probability a drug survives Phase III and reaches approval — the only metric that ultimately justifies the investment. Treat “AI-designed” as a statement about origin, not about odds of success. Every generative output still faces the same clinical gauntlet as any molecule.

A Worked Example: Rentosertib

It helps to trace one molecule end to end, because rentosertib is the clearest illustration of what generative AI can — and cannot yet — claim. The program began not with a molecule but with a target: Insilico’s platform analyzed multi-omic and text data to nominate TNIK, a kinase implicated in fibrosis, as a novel anti-fibrotic target. That is generative AI applied to hypothesis generation.

Next came generative chemistry: the platform designed novel small molecules to hit TNIK with the right potency, selectivity, and drug-like properties — iterating in silico before synthesis. Insilico has reported reaching a clinical candidate in roughly 18 months, versus a traditional 4–6 years. The molecule then ran the ordinary gauntlet — IND-enabling studies, Phase I in healthy volunteers (published in Nature Biotechnology), and the Phase IIa in IPF patients that read out positively in Nature Medicine (mean +98.4 mL FVC at 60 mg QD vs −20.3 mL for placebo).

What the example proves: AI compressed discovery and produced a molecule active in patients. What it does not prove: that the drug will clear Phase III, or that AI raised its odds of doing so. The honest reading is that generative AI shortened the path to a credible shot on goal — not that it changed the probability the shot goes in.

How the FDA Sees It

In January 2025, the FDA issued its first draft guidance on the subject: Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products. It is built around a 7-step credibility-assessment framework that scales scrutiny to the model’s risk and “context of use” — sponsors must define the regulatory question the model addresses, assess its risk, and document credibility.

The crucial nuance for dealmakers: the guidance governs AI used to support regulatory decisions across nonclinical, clinical, post-marketing, and manufacturing phases — but it explicitly does not cover AI used purely in drug discovery. So a generatively designed molecule is not subject to a different approval standard for being AI-made. The regulatory weight attaches to how AI informs your filings, not to the molecule’s origin. That is reassuring for asset value and a reminder to keep discovery and regulatory-grade AI cleanly separated in documentation.

In practice, the framework asks sponsors to define the context of use (exactly what the model does and what decision it informs), assess model risk (a function of how influential the model is and how serious the consequence of being wrong), and then scale the evidence — data, validation, monitoring — to that risk. The comment period ran to April 2025, and the direction of travel is clear: AI that touches a regulatory decision will need a documented credibility case proportionate to its influence. For dealmakers, the read-through is that a target’s AI-enabled regulatory workflows are now a diligence item — well- documented model governance is an asset; ad-hoc AI use in filings is a latent risk.

The Hard Questions: IP, Data, Hallucination

Three issues separate a fundable generative-AI asset from a liability:

Inventorship and patentability. Patent systems generally require a human inventor. An asset whose key claims depend on AI being the inventor is exposed. Clean, contemporaneous documentation of human contribution is the mitigant — and a diligence must-check.
Data provenance and rights. Generative models are only as clean as their training data. If a platform trained on third-party or improperly licensed data, downstream assets can carry contamination risk. Confirm data lineage.
Hallucination and validation. Generative models produce plausible-looking outputs that may be chemically or biologically invalid. The only defense is experimental validation — which is why the credible platforms are wet-lab heavy, not just compute-heavy.

Build, Buy, or Partner?

For a pharma or biotech deciding how to access generative AI, there are three routes, and the market is using all three at once:

Build. Stand up an internal generative-AI group and compute. Eli Lilly’s up-to-$1B supercomputer collaboration with NVIDIA is the extreme version — appropriate only for the largest players with the data and talent to justify it.
Buy / license capability. Bring a platform’s tooling in-house, as Lilly did with Chai Discovery, or take a subscription-style license, as GSK did with Noetik. This suits companies that want the capability embedded in their own workflows.
Partner on assets. Run a multi-target research collaboration (the Isomorphic template) where the platform designs and the pharma develops. This is the dominant model and the lowest- commitment way to access a frontier platform.

The right answer depends on how core AI is to the strategy and how much proprietary data the buyer brings. Most companies should partner first and build only where they have a genuine data advantage — generative models without proprietary data and wet-lab loops rarely justify the cost of building. For the structures behind each route, see our AI Drug Discovery Deal Tracker.

Where Generative AI Falls Short

A credible view of generative AI is as clear about its limits as its promise. Four constraints recur, and each should temper how much a deal pays for an AI story:

Biology, not chemistry, is the hard part. Generative models are strongest at designing molecules with desired chemical properties. They are far weaker at predicting how a molecule behaves in a living system — efficacy, toxicity, and clinical response remain stubbornly empirical.
Data scarcity in the places that matter. Models learn from data; for novel targets and rare diseases, the relevant data is thin or absent, exactly where the value would be highest.
Validation is still wet-lab-bound. Every generated candidate must be synthesized and tested. AI changes the ratio of ideas to experiments — it does not remove the experiments.
Garbage in, confident garbage out. Generative models produce fluent, plausible outputs even when wrong, which can lend false confidence to a flawed candidate or hypothesis.

The mature position: generative AI is a powerful accelerant of the earliest, cheapest stages of discovery, and a far weaker predictor of the expensive, late-stage outcomes that determine a drug’s value. Price it accordingly.

The Dealmaker’s Lens

Put commercially: generation that yields novel, patentable, well-documented assets is worth paying for; efficiency-only predictive tooling rarely justifies a premium on its own. When you evaluate a generative-AI opportunity, separate the two, verify the IP and data foundations, and discount any claim that hasn’t been validated in the lab or clinic.

Timing matters too. The window to engage frontier generative platforms on favorable terms is widest before their first clinical proof point and narrowest after — the same dynamic that made Insilico’s pre-readout partnerships look prescient in hindsight. For an asset-holder, the mirror image applies: an AI-originated molecule with clean IP and early human data commands a premium precisely because the market has learned that AI design can translate. The edge, as ever, goes to whoever can value the evidence correctly while it is still contested.

For the broader market and deal-structure context, see our AI in Drug Development: The Dealmaker’s Guide; for who is building these platforms, the AI Drug Discovery Companies power list; and for structuring the resulting deals, our cross-border licensing term sheet guide.

Vision Lifesciences diligences and structures generative-AI and cross-border deals, including the IP and data questions that decide whether an AI asset is fundable. Talk to our team.

Evaluate a generative-AI asset

Generative vs Predictive AI

Generative vs predictive AI in pharma

Where It Actually Works

The Generative Toolbox

The Proof — and Its Limits

What is — and isn’t — proven

A Worked Example: Rentosertib

How the FDA Sees It

The Hard Questions: IP, Data, Hallucination

Build, Buy, or Partner?

Where Generative AI Falls Short

The Dealmaker’s Lens

Related Strategic Insights

AI in Drug Development 2026: The Dealmaker’s Guide

Drug Discovery AI News: Latest Developments (2026)

Pricing a China Biotech Asset: NRDL, VBP & Why Dom...

Our Advisory Services

In-Licensing Advisory

Strategic Partnerships

Need expert guidance on your next deal?