4007 shaares
71 private links
71 private links
Might be an interesting trick to reduce the computation and energy costs of large language models. Let's see if it gets replicated and generalized, this is a single short paper not peer reviewed anywhere as far as I can tell.