The success rate is increased by 7 times! The new method allows AI to generate molecular design and synthesis steps in one sentence

Written by

Audrey Miles

Updated on:July-02nd-2025

The process of finding molecules with the properties needed to develop new drugs and materials is cumbersome and expensive, requiring a lot of computing resources. Researchers often spend months screening a limited number of target molecules from a vast number of candidate molecules.

Large language models like ChatGPT are expected to simplify this process, however, there are technical barriers to enabling large language models to understand and reason about the atoms and chemical bonds that make up molecules as well as words in a sentence.

Researchers at MIT and the MIT-IBM Watson AI Lab have developed a promising approach to augment large language models designed for generating and predicting molecular structures using graph-based models.

After parsing the user's natural language needs through a basic large language model, this method can intelligently switch AI modules in aspects such as molecular design, principle explanation and synthesis route planning.

It interweaves text, graphics, and synthesis step generation, integrating words, graphics, and reactions into a common vocabulary for use by large language models, achieving seamless integration of multimodal information.

Compared with existing methods based on large language models, the molecules generated by this multimodal technology are more in line with the specifications set by the user, and the success rate of effective synthesis plans is increased from 5% to 35% .

The approach outperformed large language models that are more than 10 times larger and use only textual representations to design molecules and synthetic routes, suggesting that multimodal fusion is key to the new system’s success.

"This has the potential to be an 'end-to-end' solution that can automate the entire process of molecular design and synthesis. If the large language model can give answers within seconds, it will save pharmaceutical companies a lot of time," said Michael Sun, a graduate student at MIT and co-author of the paper on the technology.

The research will be presented at the International Conference on Learning Representations. Co-authors of the paper include Notre Dame graduate student Gang Liu, MIT professor of electrical engineering and computer science Wojciech Matusik, and MIT-IBM Watson AI Lab senior scientist Jie Chen. The research was funded in part by the National Science Foundation, the Office of Naval Research, and the MIT-IBM Watson AI Lab.

Complementary advantages

Large language models are not designed to understand the subtleties of chemistry, which is one reason they are so difficult to use for reverse molecular design, the process of identifying molecular structures with specific functions or properties.

Large language models convert text into a representation called a token, which is used to predict the next word in a sentence in sequence. But molecules are "graph structures" made of atoms and chemical bonds that exist in no particular order, making them difficult to encode as sequential text.

On the other hand, graph-based models represent atoms and molecular bonds as interconnected nodes and edges in a graph. Although these models are widely used in inverse molecular design, they require complex inputs, cannot understand natural language, and can generate results that are difficult to interpret.

MIT researchers integrated large language models and graph-based models into a unified framework , leveraging their complementary strengths.

Llamole (Large Language Model for Molecule Discovery) uses the underlying large language model as an “intelligent dispatcher” to understand user queries, that is, users’ needs for molecules with specific properties in plain language.

For example, a user is looking for a molecule with a molecular weight of 209, specific bonding properties, that can penetrate the blood-brain barrier and inhibit HIV.

When the large language model predicts text in response to user queries, the system will intelligently switch between three major functional modules through a unique "trigger token" mechanism: 1. Structure generation module (based on the graph diffusion model, building a molecular skeleton according to input conditions); 2. Semantic conversion module (re-encoding the molecular structure into words that can be understood by the large language model through the graph neural network); 3. Synthesis planning module (predicting the reaction path based on the intermediate structure, and reversely deducing the complete synthesis plan from basic raw materials to target molecules).

“The subtlety of this is the information loop between modules. Everything generated by the large language model before activating a specific module is input into that module. The module will work in the same way as before,” said Michael Sun. “Similarly, the output of each module is encoded and fed back into the generation process of the large language model, so that the large language model can understand the role of each module and continue to predict tags based on this data.”

Better and simpler molecular structure

Ultimately, Llamole outputs an image of the molecular structure, a text description of the molecule, and a step-by-step synthesis plan that details how to synthesize the molecule, including the specific chemical reactions.

In experiments designing molecules that meet user specifications, Llamole outperformed 10 standard large language models, 4 fine-tuned large language models, and state-of-the-art domain-specific methods. It also increased the success rate of retrosynthetic planning from 5% to 35% by generating higher-quality molecules, which means that these molecules have simpler structures and lower-cost building blocks.

"It is difficult for large language models on their own to determine how to synthesize molecules because it requires a lot of multi-step planning. Our method can generate better and more easily synthesized molecular structures," said Gang Liu.

To train and evaluate Llamole, the researchers constructed two datasets from scratch, since existing molecular structure datasets lacked sufficient detail. They augmented hundreds of thousands of patent molecules with AI-generated natural language descriptions and custom description templates.

The dataset they built for fine-tuning the large language model contained templates associated with 10 molecular properties, so a limitation of Llamole is that it was trained to design molecules considering only these 10 numerical properties.

In future studies, the researchers hope to expand Llamole's capabilities to allow it to take into account any molecular properties. In addition, they plan to improve the graphical module and increase Llamole's retrosynthetic success rate.

In the long term, they hope to use this approach to expand its application beyond the molecular realm to create large, multimodal language models that can process other graph-based data, such as data from interconnected sensors in power grids or transaction data in financial markets.

“Llamole demonstrates the feasibility of using large language models as an interface for processing complex data beyond text descriptions, and we expect they will become the basis for interacting with other AI algorithms to solve a wide range of graph problems,” said Jie Chen.