63 private links
Maybe it's time to stop obsessing about scale and distributed architectures? The hardware has been improved quite a bit at the right places, especially storage.
This is indeed a metaphor which should be more common in enterprise software.
Interesting selection of options to model data structure with some variability in Rust.
A look back at the limitations of deep learning in the context of computer vision. We're better at avoiding over fitting nowadays but the shallowness of the available data is still a problem.
Just looking at averages is indeed quickly hiding patterns. Make sure distributions are visible in some fashion.
Nice piece to give ideas about what type of diagram to consider depending what you're exploring.
Interesting class of data structures with funny properties. Looks like there's a lot to do with them.
This is one of the handful of uses where I'd expect LLMs to shine. It's nice to see some tooling to make it easier.
A nice extension for Postgres allowing to ease the protection of personal information.
OK, the numbers are indeed impressive. And it's API is fully compatible apparently, looks like a good replacement if you got Pandas code around.
It shouldn't be, but it is a big deal. Having such training corpus openly available is one of the big missing pieces to build models.
Interesting dimensions to use when classifying syncing solutions and to see which ones will meet your constraints.
Excellent piece, we're a civilisation whose culture is built on shifting sands and... toy plastics. Guess what will survive us?
More discussion about models collapse. The provenance of data will become a crucial factor to our ability to train further models.
The more releases out there the more vulnerabilities are (and could be) discovered. Some actions are necessary to get things under control properly.
No, your model won't get smarter just by throwing more training data at it... on the contrary.
The training dataset crisis is looming in the case of large language models. They'll sooner or later run out of genuine content to use... and the generated toxic waste will end up in training data, probably leading to dismal results.
Might be a good alternative to JSON in some cases.
Looks like a nice tool for quick data exploration straight from the command line.
Looks like an interesting protocol for resilient peer to peer data stores. Let's see how it spreads.