65 private links
We really have nice facilities in the kernel to squeeze some extra performance nowadays.
Interesting, it looks like index scans in your databases can have surprising performance results with SSDs.
No good tricks to optimize your code, but knowing the tooling knobs sometimes help.
Indeed, CPU prefetchers are really good nowadays. Now you know what to do to keep your code fast.
SIMD instructions are indeed a must to get decent performance on current hardware.
It's just hard to make Python fast. It can be improved yes, but it'll stay cache un-friendly without a redesign. Nobody wants a Python 4. :-)
Interesting tricks to optimize this function in V8.
Nice new tool to investigate code generated by macros in Rust. Indeed you can quickly add lots of lines to the compiled code without even realizing, in large code bases it's worth keeping in check.
Nice exploration which shows the many levers one can use to impact Rust compilation times. They all have their own tradeoffs of course, so don't use this just for the sake of reducing time.
A good reminder that "push it to the GPU and it'll be faster" isn't true. If you move a workload to the GPU you likely have to rethink quite a bit how it's done.
We can expect this to be a game changer for the C++ ecosystem. A couple of examples are presented in this article.
What is premature optimization really? If you look at the full paper it might not be what you think. In any case we get back to: do the math and benchmark.
I'd like to see the equivalent for Europe. Clearly in the US things aren't always great for Internet access. The latency is likely higher than you think, and the bandwidth lower.
Or why this kind of question never have an absolute answer.
The memory models for GPU programming are complex. This isn't easy to squeeze more performance without introducing subtle bugs.
Interesting comparison between C++ and Rust for a given algorithm. The differences are mostly what you would expect, it's nice to confirm them.
Looks like Linux is now the best operating system for gaming on the go.
A bit dated perhaps, and yet most of the lessons in here are still valid. If performance and parallelism matter, you better keep an eye on how the cache is used.
Or why it's hard to truly evaluate performance in complex systems. We often test things in the optimistic case.
Or how it's possible to expose an object-oriented like API for a data oriented framework without sacrificing on performances.