Discussion about this post

User's avatar
Calvin McCarter's avatar

The problem that batch effects are typically confounded with underlying biology is also something that's bothered me for years. It's especially problematic that most batch effect correction methods (eg the ones that try to match distributions using the maximum mean discrepancy) don't adjust for such confounding. And methods like ComBat that allow you to include confounders are quite limited, because they assume that (1) the effect of the confounder(s) on the features is linear, and (2) that you have access to the confounding variable value on all samples, including new test samples. The latter assumption is especially problematic, because often the confounder is in fact the output variable that we want to predict on new test samples.

I've been working on this as a hobby side project for the last few years, and recently published a paper and released software that addresses this. The basic idea is that you train conditional generative models for the features given the confounder(s), then match the conditional distributions, instead of matching the marginal distributions of the features. Here's a link: https://openreview.net/pdf?id=GSp2WC7q0r . Apologies for shilling my work on your Substack! :)

Expand full comment

No posts