64 private links
Maybe we have a path forward for performance stackful coroutine? More pieces need to fall in place but this looks promising.
Interesting to see how it behaves in practice when passing parameters by value. Turns out there are surprising patterns in the data.
If you're wondering why the architecture is called "amd64" and why the itanium disappeared... this is why. It was a very good stunt from AMD back then.
Interesting trend in the CPU space. We're getting more simultaneous instructions with the passing generations.
Indeed, CPU prefetchers are really good nowadays. Now you know what to do to keep your code fast.
SIMD instructions are indeed a must to get decent performance on current hardware.
A good example of how you can get bitten by cache coherency algorithms in the CPU.
A bit dated perhaps, and yet most of the lessons in here are still valid. If performance and parallelism matter, you better keep an eye on how the cache is used.
Nice trick for highly performance sensitive data structures. Making data CPU local instead of thread local you can make a mechanism which is especially cache friendly.
Nice exploration of the microcode patching flaw which was disclosed recently. This gives a glimpse at the high level of complexity the x86 family brings on the table.
Nice primer on the impact of too many branches in your code on the CPU. This is sometimes a good way to boost performance when you're mindful about that.
It's interesting to see such a reverse engineering of this infamous bug straight from the gates layout.
Fascinating research about side-channel attacks. Learned a lot about them and website fingerprinting here. Also interesting the explanations of how the use of machine learning models can actually get in the way of proper understanding of the side-channel really used by an attack which can prevent developing actually useful counter-measures.
Data layout is essential for performance reasons. It is too often overlooked. If you want real speed you need to help the memory subsystem.
Nice list of common portability issues one can encounter at the machine architecture level. But don't be fooled, this doesn't have implications only for C and C++, those problems leak in higher level languages as well.
SIMD is hard to use, not all problems can apply to it. But when they can, the performance gain can be great.
Luckily this kind of very low level vulnerabilities are not too common and difficult to exploit. But when they get exploited all things break loose and you can't trust your hardware anymore.
Interesting article about what's coming for the branch predictor in the Zen 5 architecture from AMD.
A new type of attack targeting the CPU indirect branch predictor.
SIMD keeps providing interesting performance boosts for parsing work loads.