Discussion about this post

User's avatar
ScienceGrump's avatar

The remove common essentials function looks like it silently failed. The mistake is that it extracts the gene symbol from the gene label before filtering. The tell is that genes like RPS8 and POLR2E show up in the hit list.

You can say that manual code is filled with mistakes like this, and that's true. But if you are doing this in a notebook (as you should), you're a lot more likely to do things like check the shape of your data as you filter and notice a transform didn't do anything. You're also much more likely to make plots to help decide if your thresholds make sense, and to see that only 16 noncancerous lines are in the data, and 10 of those are prostate and eye - which would raise some concerns! (It's not surprising that you still see enrichment for chemotherapy targets - if you intersect moderately pan-lethal knockouts with druggable targets, I expect you'd get a similar enrichment).

As far as time is concerned, it took me ten minutes and 12 lines of code to get a hit list of genes sorted by fraction cancer cell lines with scores < -.5 excluding those with mean < -.3 in noncancerous models, and correctly filtering common essentials. That includes going to the DepMap website, downloading the data, and starting a notebook. LLMs can have value for overcoming the activation energy to start a project, but are you sure they're actually saving you the time you think they are? I have never, ever had a project where literally typing the code was the limiting factor. It's always deciding what I *should* do. LLMs will do that for you too, of course, and that's exactly the problem.

They're most valuable for total unknowns. Like if you have no idea what data might exist to answer some question, they can point you to some large public dataset and immediately orient you to it. That's very valuable. But this example mostly reconfirms for me that just vibecoding the analysis is a bad idea.

Phil Oliver's avatar

I'm not familiar with DepMap beyond your article and taking a quick look at the site, so I may be missing it - but to get back to your original thought, it seems pretty important to me to ask whether the essential genes in cancer are also especially important to healthy cells. If I were focusing on the problem, I'd want to take a close look at the essential cancer genes and identify ones that are of least importance to normal metabolism.

2 more comments...

No posts

Ready for more?