Imagine you could talk to cells and ask them what they’re up to. That’s more or less what Cell2Sentence-Scale (C2S-Scale) lets you do. Built by Google Research and Yale, it’s an open-source model that takes the huge, messy data from single-cell RNA sequencing—basically, a readout of which genes are switched on in each cell—and turns it into something a language model can understand. It translates all those numbers into 'cell sentences,' so you can use natural language to explore what’s happening inside your cells.

Under the hood, C2S-Scale is built on top of Google’s Gemma models, which come in all shapes and sizes—from the small and nimble to the absolutely gigantic. The trick isn’t in changing the model itself, but in how you feed it the data. By turning gene activity into sentences and mixing in extra biological details, the model learns from over a billion pieces of scientific data and research papers.

So what can you actually do with it? The models can tell you what kind of cell you’re looking at, dream up new cells and tissues, predict how cells will react if you poke them with a drug or tweak their genes, and even answer your questions in plain English. You can try them out on HuggingFace or dig into the code on GitHub​. And, as you might guess, the bigger the model, the smarter it gets—the largest version can spot patterns and make connections that the smaller ones just can’t.

Why does this matter? Until now, making sense of single-cell data has been a job for specialists with complicated tools. But with C2S-Scale, you can run quick, virtual experiments to see how cells might react to different drugs or tweaks—before you ever step into a lab. In fact, Yale and Google used it to screen more than 4,000 drugs on the computer and found a combo that boosted a key immune signal by about 50 percent when they tested it for real.

If you want to try it out, start small. The 410M model is a good way to get a feel for how it works on your own data—it’ll tell you what kind of cells you have and explain its reasoning in plain language. If you need to predict how cells will respond to different treatments, the big 27B model is the one that can handle the tricky, context-dependent stuff, like figuring out which drugs only work in certain immune situations.

Since it’s built on Gemma, you don’t have to reinvent the wheel. You can use all the usual tools for training and deploying language models, instead of building something totally new for biology. The catch? This is really only useful if you’re already knee-deep in single-cell RNA data and want to speed up your experiments or test new ideas. It’s not a replacement for real biological know-how.

Access models and code at Google Research