71 private links
And yet another reverse proxy to use as a scraper deterrent... It looks like several are popping every week lately.
Don't underestimate how much of a skill making a stupid crawler can be...
When a big player has to prepare a labyrinth of AI generated content to trap bots used to feed generative AI learning pipelines... something feels wrong.
Despite the marketing speak... it's definitely not there yet. So far all the attempts at using LLM for coding larger pieces end up in this kind of messy results. It helps kickstarting a project indeed but quickly degenerates after that.
More details about the impacts of the LLM companies acting like vandals... This is clearly widespread and generating work for everyone for nothing.
Still so reliable... could we confine this to NLP uses please? Should never had been used for anything looking remotely like search.
Those bots are really becoming the scourge of the Internet... Is it really necessary to DDoS every forge out there to build LLMs? And that's not even counting all the other externalities, the end of the article make it clear: "If blasting CO2 into the air and ruining all of our freshwater and traumatizing cheap laborers and making every sysadmin you know miserable and ripping off code and books and art at scale and ruining our fucking democracy isn’t enough for you to leave this shit alone, what is?"
Words are important, I'm dismayed at how the marketing speak around generative AI is what people use... that completely muddies the thinking around them.
More smaller footprint models are becoming available. This is becoming interesting.
So much data trapped in PDFs indeed... Unfortunately VLM are still not reliable enough to be unleashed without tight validation of the output.
I like this kind of research as it also says something about our own cognition. The results comparing two models and improving them are fascinating.
Are we surprised? Not really... This kind of struggle was an obvious outcome from the heavy dependencies between both companies.
Here we go for a brand new marketing stunt from OpenAI. You can also tell the pressure is rising since all of this is still operating at a massive loss.
Friendly reminder that AI was also supposed to be a field about studying cognition... There's so many things we still don't understand that the whole "make it bigger and it'll be smart" obsession looks like it's creating missed opportunities to understand ourselves better.
This is one of the handful of uses where I'd expect LLMs to shine. It's nice to see some tooling to make it easier.
Early days but this looks like an interesting solution to democratize the inference of large models.
I like this paper, it's well balanced. The conclusion says is all: if you're not actively working on reducing the harms then you might be doing something unethical. It's not just a toy to play with, you have to think about the impacts and actively reduce them.
Interesting research, looking forward to the follow ups to see how it evolves over time. For sure the number of issues is way to high still to make trustworthy systems around search and news.
This might be accidental but this highlights the lack of transparency on how those models are produced. It also means we should get ready for future generation of such models to turn into very subtle propaganda machines. Indeed even if for now it's accidental I doubt it'll be the case much longer.
This is definitely a problem. It's doomed to influence how tech are chosen on software projects.