When NVIDIA chief Jensen Huang, spoke about how he used ChatGPT to understand how generative AI can be used for solving real-world problems such as dissolving plastics, reducing carbon emissions and more, little did we know that a European AI startup will use LLMs for DNA and protein sequences to address this very problem, and actual use cases are emerging.
“About 60% of the things that we consume today, whether they’re drugs or food or chemicals, you could be making through biological means. That just felt a lot more impactful than some of the other applications that people were working on,” said Stef van Grieken, co-founder and CEO of Cradle, in an exclusive interaction with AIM.
Engineering Biology with LLMs
Cradle, is a European biotech startup that employs AI to help scientists design and engineer proteins faster and cost-effectively. The AI startup focuses on engineering protein modalities such as enzymes, vaccines, peptides, and antibodies with the help of generative AI.
Akin to a ChatGPT, where you give it an equation and get an answer, or a diffusion model, where you give it a prompt and get a picture, at Cradle, a DNA description or how a molecule looks is inputted and what needs to be done with it is added. For instance, bind to a particular thing on the cell, be stable, or be soluble in water.
“What it does is it generates another set of sequences that you can bring into your laboratory that have a much higher probability of doing that,” said Grieken. “Instead of diffusing a picture, you’re diffusing a molecule.”
Similar to how GPT is trained by infilling, which is when you remove words from sentences and ask the model to fill in, Cradle works on a similar model except that is done for DNA and protein sequences.
With these models, the number of advancements surpassing previous benchmarks and the scale of their enhancement are approximately double that of previous methods.“This means that you reach your target twice as fast over the duration of an R&D project,” said Grieken.
“A lot of the work that companies like Google, Facebook and others are doing is more in machine learning research and development. They’re not trying to build tools that help biologists use these types of methods in a sort of easy fashion,” he said.
Cradle works on proprietary models with inspiration from open-source models such as Transformer-based Bert. “In terms of technology capabilities in biology, such as molecular biology, we’re still much like GPT 0.5,” he said.
Data and Feedback Loops Remain Challenging
The scarcity of data on proteins impedes the speed at which such models are developed, especially, when compared to training GPT models with all the information that is available on the internet. “Training these models on public data is really hard to do. It’s one of the reasons why we have our in-house laboratory to effectively build training sets for these machine learning models to learn faster,” said Grieken.
The slow feedback loops for these models also impedes progress. Grieken compares the process to GPT models, where an instant feedback on the generated results, if it is wrong, bad, or right, can help instantly train the models. “In our case, it takes three months between the thing being generated and results coming back,” he said. Furthermore, the cost of generating results is high and can range between $30 and $1000s per data point.
To Make the World A Better Place
Cradle solves a number of real-world problems associated with medical research, especially when it comes to time, cost and logistics accessibility. Many vaccines are hard to distribute in different parts of the world, owing to cold storage and distribution networks.
“If you can develop certain drugs that work at room temperature, you can bring them to more places in the world, which is helpful, so you can end up with a better product,” said Grieken.
Grieken also believes that if the amount of time and money required to bring out solutions for curing diseases or moving away from petrochemical oil-based products to more bio-based products are reduced, there will be a lot more of these types of products entering the market.
“I have two small daughters. Twenty years from now they will ask me what I did when the Earth caught fire, and the answer, ‘I was working for an advertising company’, is probably not the best one. So, try to make use of my time,” joked Grieken when asked about the inspiration to start Cradle.
Coming from a vast experience in a big-tech company like Google, Grieken recommends that everyone must work for a large tech company for a while and then go build something else once the learnings are gathered.
“I’m incredibly grateful to Google. First of all, they teach you how to do engineering. Secondly, I was fortunate to be at Google around the time when language models started to emerge,” said Grieken, who considers himself to be incredibly lucky to be there in the early days.
Cradle has raised a total funding of $29.7 million, and has two offices in Amsterdam in the Netherlands and Zurich in Switzerland.