Search: [copyright] - ervin's web review

Selfish AI

Let's not forget the ethical implications of those tools indeed. Too often people put them aside simply on the "oooh shiny toys" or the "I don't want to be left behind" reactions. Both lead to a very unethical situation.

tech · ai · machine-learning · gpt · copilot · copyright · ecology · economics · ethics

February 2, 2026 at 6:51:56 PM GMT+1 * · permalink

·

https://www.garfieldtech.com/blog/selfish-ai

·

On FLOSS and training LLMs

I'm not sure the legal case is completely lost even though chances are slim. The arguments here are worth mulling over though. There's really an ethical factor to consider.

tech · ai · machine-learning · gpt · copilot · foss · law · ethics · copyright

January 15, 2026 at 7:27:34 PM GMT+1 * · permalink

·

https://chronicles.mad-scientist.club/tales/on-floss-and-training-llms/

·

AI's Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source

This is really a big problem that those companies created for Free Software communities. Due to the lack of regulation they're going around distributing copyright removal machines and profiting from them. They should have been barred from ingesting copyleft material in the first place.

tech · ai · machine-learning · gpt · copilot · criticism · foss · commons · copyright · community

December 19, 2025 at 10:47:12 AM GMT+1 * · permalink

·

https://www.quippd.com/writing/2025/12/17/AIs-unpaid-debt-how-llm-scrapers-destroy-the-social-contract-of-open-source.html

·

It turns out you can train AI models without copyrighted material

If there was still any doubt that the arguments coming from the big model providers were lies... Yes, you can train large models using a corpus of training data for which you respect the license. With the diminish return in performance of the newer families of models, the performance they got from the model trained on that corpus is not bad at all.

tech · ai · machine-learning · gpt · copyright · foss · ethics

June 6, 2025 at 12:11:24 PM GMT+2 * · permalink

·

https://www.engadget.com/ai/it-turns-out-you-can-train-ai-models-without-copyrighted-material-174016619.html?src=rss

·

The magic developer wand...

Not only the tools have ethical issues, but the producers just pretend "we'll solve it later". A bunch of empty promises.

tech · ai · machine-learning · gpt · ethics · ecology · copyright

May 27, 2025 at 8:09:42 AM GMT+2 * · permalink

·

https://gomakethings.com/the-magic-developer-wand.../

·

Tools

LLMs are indeed not neutral. There's a bunch of ethical concerns on which you don't have control when you use them.

tech · ai · machine-learning · gpt · copyright · ethics

May 27, 2025 at 8:06:51 AM GMT+2 * · permalink

·

https://adactio.com/journal/21926

·

A Company Reminder for Everyone to Talk Nicely About the Giant Plagiarism Machine

Nice little satire, we could easily imagine some CEOs writing this kind of memo.

tech · ai · machine-learning · gpt · copyright · satire

May 26, 2025 at 11:24:31 AM GMT+2 * · permalink

·

https://www.mcsweeneys.net/articles/a-company-reminder-for-everyone-to-talk-nicely-about-the-giant-plagiarism-machine

·

You cannot have our user's data

Sourcehut pulled the trigger on their crawler deterrent. Good move, good explanations of the reasons too.

tech · ai · machine-learning · gpt · copilot · copyright · criticism

April 16, 2025 at 11:54:58 AM GMT+2 * · permalink

·

https://sourcehut.org/blog/2025-04-15-you-cannot-have-our-users-data/

·

Beyond Public Access in LLM Pre-Training Data: Non-public book content in OpenAI’s Models – Social Science Research Council (SSRC)

We just can't leave the topic of how the big model makers are building their training corpus unaddressed. This is both an ethics and economics problem. The creators of the content used to train such large models should be compensated in a way.

Between this, the crawlers they use and the ecological footprint of the data centers, there are so many negative externalities to those systems that law makers should have cease the topic a while ago. The paradox is that if nothing is done about it, the reckless behavior of the model makers will end up hurting them as well.

tech · ai · machine-learning · gpt · copyright · ethics

April 3, 2025 at 11:39:42 AM GMT+2 * · permalink

·

https://www.ssrc.org/publications/beyond-public-access-in-llm-pre-training-data-non-public-book-content-in-openais-models/

·

OpenAI's Studio Ghibli meme factory is an insult to art itself

Sure, a filter which turns pictures into something with the Ghibli style looks cute. But make no mistake, it has utter political motives. They need a distraction from their problems and it's yet another way to breach a boundary. Unfortunately I expect people will comply and use the feature with enthusiasm...

tech · ai · machine-learning · gpt · politics · culture · art · copyright

March 28, 2025 at 11:06:04 AM GMT+1 * · permalink

·

https://www.bloodinthemachine.com/p/openais-studio-ghibli-meme-factory

·

Music labels will regret coming for the Internet Archive, sound historian says

Once again the music labels can't understand the cultural value of building archives. Let's hope they loose the lawsuit.

tech · culture · archive · history · law · copyright

March 10, 2025 at 8:58:43 AM GMT+1 * · permalink

·

https://arstechnica.com/tech-policy/2025/03/music-labels-will-regret-coming-for-the-internet-archive-sound-historian-says/

·

The Ghosts in the Machine, by Liz Pelly

You love artists and their music? You probably should get off Spotify then... because they're clearly at war to reduce even further how much they pay artists. Clearly it's not about discovering artists anymore, it's about pumping cheap stock music to increase their margin. Also its clear the remaining musicians trapped in that system will be automated away soon... you don't need humans to create soulless music.

tech · music · copyright

December 23, 2024 at 10:08:31 AM GMT+1 * · permalink

·

https://harpers.org/archive/2025/01/the-ghosts-in-the-machine-liz-pelly-spotify-musicians/

·

Core copyright violation claim moves ahead in The Intercept’s lawsuit against OpenAI

Another lawsuit making progress against OpenAI and their shady practice.

tech · ai · machine-learning · gpt · copyright · law

December 1, 2024 at 10:18:31 AM GMT+1 * · permalink

·

https://www.niemanlab.org/2024/11/copyright-claim-moves-ahead-in-the-intercepts-lawsuit-against-openai/

·

OpenAI blamed NYT for tech problem erasing evidence of copyright abuse

More shady practices to try to save themselves. Let's hope it won't work.

tech · ai · machine-learning · gpt · law · copyright

November 26, 2024 at 9:37:47 AM GMT+1 * · permalink

·

https://arstechnica.com/tech-policy/2024/11/tech-problems-plague-openai-court-battles-judge-rejects-a-key-fair-use-defense/

·

Releasing the largest multilingual open pretraining dataset

It shouldn't be, but it is a big deal. Having such training corpus openly available is one of the big missing pieces to build models.

tech · ai · machine-learning · gpt · data · copyright · licensing

November 14, 2024 at 7:47:03 AM GMT+1 * · permalink

·

https://simonwillison.net/2024/Nov/14/releasing-the-largest-multilingual-open-pretraining-dataset/#atom-blogmarks

·

The Public Domain Problem

Putting things in the public domain voluntarily is indeed more difficult than it should be. The best tool we got is CC0, but it still raises (probably unwarranted) concerns for software.

tech · copyright · public-domain · commons

September 30, 2024 at 11:03:01 AM GMT+2 * · permalink

·

https://dpk.land/pubdmn.txt

·

The Internet Archive just lost its appeal over ebook lending - The Verge

This is really bad news... Clearly the publishers cartel would try to outlaw libraries if they were invented today.

tech · copyright · law · library

September 5, 2024 at 6:10:33 AM GMT+2 * · permalink

·

https://www.theverge.com/2024/9/4/24235958/internet-archive-loses-appeal-ebook-lending

·

Unbundling Profile: MIT Libraries - SPARC

It's good to see major institutions like this get out of contracts with scientific publishing companies. Those unfortunately became mostly parasitic. Open access should be the norm for research.

research · copyright · open-access

August 21, 2024 at 8:23:02 AM GMT+2 * · permalink

·

https://sparcopen.org/our-work/big-deal-knowledge-base/unbundling-profiles/mit-libraries/

·

Machine Unlearning in 2024 - Ken Ziyu Liu - Stanford Computer Science

Interesting questions and state of the art around model "unlearning". This became important due to the opacity of data sets used to train some models. It'll also be important in any case for managing models over time.

tech · ai · machine-learning · gpt · copyright · gdpr

May 7, 2024 at 7:30:52 AM GMT+2 * · permalink

·

https://ai.stanford.edu/~kzliu/blog/unlearning

·

A Return to Blu-ray as Streaming Value Evaporates | Audioholics

Interesting, with the price hikes and bundles to come, we might indeed see a resurgence in physical media. It will stay niche for sure, but looks like demand is about to grow.

tech · streaming · movie · copyright · economics

March 26, 2024 at 9:04:31 AM GMT+1 * · permalink

·

https://www.audioholics.com/news/a-return-to-blu-ray-as-streaming-value-evaporates

·