Everyone makes mistakes, what matters is how you handle them.
Very in depth review of the mess of a Matrix home server vide coded at Cloudflare... all the way to the blog announcing it. Unsurprisingly this didn't go well and they had to cover their tracks several times. The response from the Matrix foundation is a bit underwhelming, it's one thing to be welcoming, it's another to turn a blind eye to such obvious failures. This doesn't reflect well on both Cloudflare and the Matrix Foundation I'm afraid.
Solving paper cuts pay off faster than you'd think.
Maybe we can expect improvements in how HTTP rate limiting is handled?
Error handling is still not a properly solved problem in my opinion. At least the Rust community discusses the topic quite a bit. This is good inspiration for other ecosystems as well I think.
This is indeed interesting to see how the landscape evolved around error handling. There's clearly a tension between exceptions and the result types we've seen popping up everywhere now.
A bit too unapologetic regarding Rust API choices for my taste. Still, it gives a good idea on how error handling works in Rust.
You assumed you could deserialise in a zero copy fashion? Are you really sure about that? Think twice.
A good explainer on what metastable failures are and how to try to mitigate them.
This is what you're signing up to with such ecosystems. Can't use those for backups even though people are led this way. Sure technically the data is safe on their infrastructure, but is your access to said infrastructure guaranteed? This gilded cage looks less like a gift when you loose access.
Very Rust focused, still it's an interesting debate. It gives a good overview of the different types of lock behaviors in case of failures. It's very much advocating for the poisoning approach which is indeed an interesting one (coming with its own tradeoffs of course).
Decades that our industry doesn't improve its track record. But there are real consequences for users. Some more ethics would be welcome in our profession.
A bit of a shameless plug toward the end. That said the explanations of why Cloudflare is banking on Rust so much or how the recent downtime could have been avoided are spot on.
Error handling is not easy. Having simple rules to apply for complex systems is a good thing. Of course the difficulty is to apply them consistently.
Interesting point of view. Indeed, you probably want things to not be available 100% of the time. This forces you to see how resilient things really are.
Depending on the ecosystem it's more or less easy indeed. Let's remember that error handling is one of the hard problems to solve.
If it fails for everyone then it's not a bad choice on your part, right?
Everyone makes mistakes eventually, the real difference is in how you deal with them.
Clearly the error handling landscape still evolves in Rust and that's a good thing. The current solutions are too fragmented at the moment.
Matrix.org - How we discovered, and recovered from, Postgres corruption on the matrix.org homeserver
Wow, this was a really bad index corruption indeed.