Generative AI in Pharma: What Dealmakers Need to Know
Past the hype, generative AI is doing real, specific work in drug development — and raising real, specific questions about regulation and intellectual property. Here is the version a dealmaker can act on.

Generative vs Predictive AI
The single most useful distinction in this space is also the one most often blurred in marketing. Predictive AI takes existing options and scores them — which of these compounds is likely toxic, which patients are likely to respond. It has been quietly useful in pharma for years. Generative AI does something different and newer: it creates outputs that did not exist before — a novel small-molecule structure, an entirely new protein or antibody sequence, a drafted clinical protocol.
The distinction matters commercially because generation is what can produce novel, ownable assets. A better toxicity filter improves efficiency; a newly designed molecule can become patentable intellectual property. In practice the leading platforms chain the two — generate a wide space of candidates, then use predictive models to filter to the few worth synthesizing.
Where dealmakers go wrong is taking a platform’s “generative AI” branding at face value. Plenty of tools marketed as generative are, under the hood, mostly predictive — valuable, but not a source of defensible IP. The diligence question is simple and clarifying: does this platform create a novel composition of matter that can be owned, or does it optimize the search over molecules that already exist? The answer determines whether you are buying an asset engine or an efficiency tool — and they are worth very different multiples.
Generative vs predictive AI in pharma
| Dimension | Generative | Predictive |
|---|---|---|
What it does Generation expands the search space; prediction narrows it. | Creates new molecules/proteins/text | Scores, ranks, or classifies existing options |
Typical use Most platforms use both — generate, then predict/filter. | De novo design, protein design, document drafting | Toxicity prediction, hit triage, patient stratification |
Key risk Generative outputs must be experimentally validated, always. | Plausible but invalid outputs (“hallucination”) | Bias and brittleness on out-of-distribution data |
Deal signal Generation can create defensible assets; prediction rarely does alone. | Novel IP, but inventorship questions | Efficiency gains, harder to defend as moat |
Where It Actually Works
Four uses have moved from demo to genuine value:
- De novo small-molecule design. Generative chemistry models propose novel structures against a target, which are then synthesized and tested. This is the core of platforms like Insilico, Iambic, and Genesis.
- Generative protein and antibody design. The AlphaFold lineage (structure prediction) and the David Baker lineage (de novo protein design, which seeded Xaira) can now generate functional, lab-validated proteins and binders — expanding the design space for biologics.
- Target and hypothesis generation. Models propose novel disease targets from multi-omic data — the step that produced the TNIK target behind Insilico’s rentosertib.
- Regulatory and trial document drafting. Large language models draft protocols, CSRs, and submission documents — the least glamorous but fastest-adopted use, and the one most directly touched by FDA guidance (below).
The Generative Toolbox
“Generative AI” is an umbrella over several distinct techniques, and the distinctions matter when you diligence a platform — each has a different failure mode:
- Diffusion models for molecules and proteins. The same family of models behind AI image generation has been adapted to “denoise” novel chemical structures and protein backbones into existence. The David Baker lab’s diffusion-based protein design (which seeded Xaira) is the headline example.
- Protein language models. Trained on the universe of known protein sequences, these models generate plausible new sequences with desired properties — the basis of much antibody and enzyme design.
- Structure prediction as scaffolding. AlphaFold and its successors are not generative per se, but they provide the structural ground truth that generative design builds on — predict the fold, then design against it.
- Large language models for chemistry and documents. LLMs both propose molecules in text-based chemical representations and draft the regulatory and trial documents that surround a program.
The practical point: a platform’s technique determines what it is good at and where it breaks. A protein-design shop and a small-molecule generative-chemistry shop are not interchangeable, and neither is automatically credible at the other’s job.
The Proof — and Its Limits
For years generative AI in pharma was a promise without a clinical receipt. That changed in 2025: Insilico Medicine’s rentosertib — a TNIK inhibitor for idiopathic pulmonary fibrosis whose target was nominated by AI and whose molecule was generated by AI — posted a positive Phase IIa, published in Nature Medicine. On the biologics side, generative protein-design tools have produced functional binders validated in the lab.
What is — and isn’t — proven
A Worked Example: Rentosertib
It helps to trace one molecule end to end, because rentosertib is the clearest illustration of what generative AI can — and cannot yet — claim. The program began not with a molecule but with a target: Insilico’s platform analyzed multi-omic and text data to nominate TNIK, a kinase implicated in fibrosis, as a novel anti-fibrotic target. That is generative AI applied to hypothesis generation.
Next came generative chemistry: the platform designed novel small molecules to hit TNIK with the right potency, selectivity, and drug-like properties — iterating in silico before synthesis. Insilico has reported reaching a clinical candidate in roughly 18 months, versus a traditional 4–6 years. The molecule then ran the ordinary gauntlet — IND-enabling studies, Phase I in healthy volunteers (published in Nature Biotechnology), and the Phase IIa in IPF patients that read out positively in Nature Medicine (mean +98.4 mL FVC at 60 mg QD vs −20.3 mL for placebo).
What the example proves: AI compressed discovery and produced a molecule active in patients. What it does not prove: that the drug will clear Phase III, or that AI raised its odds of doing so. The honest reading is that generative AI shortened the path to a credible shot on goal — not that it changed the probability the shot goes in.
How the FDA Sees It
In January 2025, the FDA issued its first draft guidance on the subject: Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products. It is built around a 7-step credibility-assessment framework that scales scrutiny to the model’s risk and “context of use” — sponsors must define the regulatory question the model addresses, assess its risk, and document credibility.
The crucial nuance for dealmakers: the guidance governs AI used to support regulatory decisions across nonclinical, clinical, post-marketing, and manufacturing phases — but it explicitly does not cover AI used purely in drug discovery. So a generatively designed molecule is not subject to a different approval standard for being AI-made. The regulatory weight attaches to how AI informs your filings, not to the molecule’s origin. That is reassuring for asset value and a reminder to keep discovery and regulatory-grade AI cleanly separated in documentation.
In practice, the framework asks sponsors to define the context of use (exactly what the model does and what decision it informs), assess model risk (a function of how influential the model is and how serious the consequence of being wrong), and then scale the evidence — data, validation, monitoring — to that risk. The comment period ran to April 2025, and the direction of travel is clear: AI that touches a regulatory decision will need a documented credibility case proportionate to its influence. For dealmakers, the read-through is that a target’s AI-enabled regulatory workflows are now a diligence item — well- documented model governance is an asset; ad-hoc AI use in filings is a latent risk.
The Hard Questions: IP, Data, Hallucination
Three issues separate a fundable generative-AI asset from a liability:
- Inventorship and patentability. Patent systems generally require a human inventor. An asset whose key claims depend on AI being the inventor is exposed. Clean, contemporaneous documentation of human contribution is the mitigant — and a diligence must-check.
- Data provenance and rights. Generative models are only as clean as their training data. If a platform trained on third-party or improperly licensed data, downstream assets can carry contamination risk. Confirm data lineage.
- Hallucination and validation. Generative models produce plausible-looking outputs that may be chemically or biologically invalid. The only defense is experimental validation — which is why the credible platforms are wet-lab heavy, not just compute-heavy.
Build, Buy, or Partner?
For a pharma or biotech deciding how to access generative AI, there are three routes, and the market is using all three at once:
- Build. Stand up an internal generative-AI group and compute. Eli Lilly’s up-to-$1B supercomputer collaboration with NVIDIA is the extreme version — appropriate only for the largest players with the data and talent to justify it.
- Buy / license capability. Bring a platform’s tooling in-house, as Lilly did with Chai Discovery, or take a subscription-style license, as GSK did with Noetik. This suits companies that want the capability embedded in their own workflows.
- Partner on assets. Run a multi-target research collaboration (the Isomorphic template) where the platform designs and the pharma develops. This is the dominant model and the lowest- commitment way to access a frontier platform.
The right answer depends on how core AI is to the strategy and how much proprietary data the buyer brings. Most companies should partner first and build only where they have a genuine data advantage — generative models without proprietary data and wet-lab loops rarely justify the cost of building. For the structures behind each route, see our AI Drug Discovery Deal Tracker.
Where Generative AI Falls Short
A credible view of generative AI is as clear about its limits as its promise. Four constraints recur, and each should temper how much a deal pays for an AI story:
- Biology, not chemistry, is the hard part. Generative models are strongest at designing molecules with desired chemical properties. They are far weaker at predicting how a molecule behaves in a living system — efficacy, toxicity, and clinical response remain stubbornly empirical.
- Data scarcity in the places that matter. Models learn from data; for novel targets and rare diseases, the relevant data is thin or absent, exactly where the value would be highest.
- Validation is still wet-lab-bound. Every generated candidate must be synthesized and tested. AI changes the ratio of ideas to experiments — it does not remove the experiments.
- Garbage in, confident garbage out. Generative models produce fluent, plausible outputs even when wrong, which can lend false confidence to a flawed candidate or hypothesis.
The mature position: generative AI is a powerful accelerant of the earliest, cheapest stages of discovery, and a far weaker predictor of the expensive, late-stage outcomes that determine a drug’s value. Price it accordingly.
The Dealmaker’s Lens
Put commercially: generation that yields novel, patentable, well-documented assets is worth paying for; efficiency-only predictive tooling rarely justifies a premium on its own. When you evaluate a generative-AI opportunity, separate the two, verify the IP and data foundations, and discount any claim that hasn’t been validated in the lab or clinic.
Timing matters too. The window to engage frontier generative platforms on favorable terms is widest before their first clinical proof point and narrowest after — the same dynamic that made Insilico’s pre-readout partnerships look prescient in hindsight. For an asset-holder, the mirror image applies: an AI-originated molecule with clean IP and early human data commands a premium precisely because the market has learned that AI design can translate. The edge, as ever, goes to whoever can value the evidence correctly while it is still contested.
For the broader market and deal-structure context, see our AI in Drug Development: The Dealmaker’s Guide; for who is building these platforms, the AI Drug Discovery Companies power list; and for structuring the resulting deals, our cross-border licensing term sheet guide.
Vision Lifesciences diligences and structures generative-AI and cross-border deals, including the IP and data questions that decide whether an AI asset is fundable. Talk to our team.