Profluent - Scaling AI in Biology
Scale the transformer in language, learn the blueprint of language. Scale the transformer in biology, learn the blueprint of nature.
In 2019 I sent a rather cold email to Ilya Sutskever. In hindsight it’s rather humorous — I thought he’d be interested in reading Rich Sutton’s The Bitter Lesson along with my thoughts on how dramatically I believed the world was going to change as people scaled AI. He responded warmly asking to meet at Stable Cafe.
This was part of a two week stretch that is among the most electric of my career. I would read research papers late into the evening only stopping to eagerly tell my wife about the elegance of masked language modeling and what it pointed to. I tinkered with fine-tuning BERT, joyously taunting the best ML Engineer I knew that he was going to be unemployed soon (he’s now at OpenAI).
We have seen what happens when the scaling hypothesis is applied to language.
Last summer I had the joy of experiencing this feeling again when I met Ali Madani and learned what’s happening as the scaling hypothesis is pursued in biology.
In January 2023 Ali and his team published research detailing for the first time how large AI models trained on biological data can generate complete proteins that don’t exist in nature yet function as well as proteins that evolved over millions of years. This model was equivalent in size to GPT-2, a model that was released in 2019.
Large AI models in biology exhibit similar traits to what has been observed in language models — the base model can be fine-tuned to improve performance in specific domains and new capabilities emerge with scale.
Last month the New York Times covered a recent release from Ali’s company, Profluent. Profluent has trained an LLM on a massive amount of biological data and have an AI model that can design gene editors that do not exist in nature. Profluent has now shown that a gene editor completely designed from scratch by AI can successfully edit human DNA. And they are open sourcing an AI-generated gene editor for free use. This is truly remarkable for many reasons, but particularly for the future it points to as the scaling hypothesis plays out in biology.
I am so delighted to be involved with Ali and Profluent as they advance the frontier of AI research in biology and work to make biology programmable. You can read more about our investment on Spark’s site.