3995 shaares
64 private links
64 private links
We just can't leave the topic of how the big model makers are building their training corpus unaddressed. This is both an ethics and economics problem. The creators of the content used to train such large models should be compensated in a way.
Between this, the crawlers they use and the ecological footprint of the data centers, there are so many negative externalities to those systems that law makers should have cease the topic a while ago. The paradox is that if nothing is done about it, the reckless behavior of the model makers will end up hurting them as well.