63 private links
If there was still any doubt that the arguments coming from the big model providers were lies... Yes, you can train large models using a corpus of training data for which you respect the license. With the diminish return in performance of the newer families of models, the performance they got from the model trained on that corpus is not bad at all.
Not only the tools have ethical issues, but the producers just pretend "we'll solve it later". A bunch of empty promises.
LLMs are indeed not neutral. There's a bunch of ethical concerns on which you don't have control when you use them.
Nice little satire, we could easily imagine some CEOs writing this kind of memo.
Sourcehut pulled the trigger on their crawler deterrent. Good move, good explanations of the reasons too.
We just can't leave the topic of how the big model makers are building their training corpus unaddressed. This is both an ethics and economics problem. The creators of the content used to train such large models should be compensated in a way.
Between this, the crawlers they use and the ecological footprint of the data centers, there are so many negative externalities to those systems that law makers should have cease the topic a while ago. The paradox is that if nothing is done about it, the reckless behavior of the model makers will end up hurting them as well.
Sure, a filter which turns pictures into something with the Ghibli style looks cute. But make no mistake, it has utter political motives. They need a distraction from their problems and it's yet another way to breach a boundary. Unfortunately I expect people will comply and use the feature with enthusiasm...
Once again the music labels can't understand the cultural value of building archives. Let's hope they loose the lawsuit.
You love artists and their music? You probably should get off Spotify then... because they're clearly at war to reduce even further how much they pay artists. Clearly it's not about discovering artists anymore, it's about pumping cheap stock music to increase their margin. Also its clear the remaining musicians trapped in that system will be automated away soon... you don't need humans to create soulless music.
Another lawsuit making progress against OpenAI and their shady practice.
More shady practices to try to save themselves. Let's hope it won't work.
It shouldn't be, but it is a big deal. Having such training corpus openly available is one of the big missing pieces to build models.
Putting things in the public domain voluntarily is indeed more difficult than it should be. The best tool we got is CC0, but it still raises (probably unwarranted) concerns for software.
This is really bad news... Clearly the publishers cartel would try to outlaw libraries if they were invented today.
It's good to see major institutions like this get out of contracts with scientific publishing companies. Those unfortunately became mostly parasitic. Open access should be the norm for research.
Interesting questions and state of the art around model "unlearning". This became important due to the opacity of data sets used to train some models. It'll also be important in any case for managing models over time.
Interesting, with the price hikes and bundles to come, we might indeed see a resurgence in physical media. It will stay niche for sure, but looks like demand is about to grow.
Very interesting piece. The chances that it is another bubble are high. It's currently surviving on a lot of wishful thinking and hypothetical. This really feels like borrowed time... I wonder what useful will remain once it all collapses. Coding assistants are very likely to survive. Clearly there could be interesting uses in a more sober approach.
This is a nice ruling about GPL violation in France. Gives some more weight to the GPL.
This is an interesting move, we'll see if this certification gets any traction.