64 private links
This is definitely true. Keep all this in mind when dealing with performance questions: design properly for the task, profile and profile some more, focus on the hotspots, keep things maintainable.
Nice bag of tricks for better Rust performance at low level. The compiler is indeed helping quite a bit here.
This is an interesting and deeply buried optimization for the GNU C++ STL implementation. I didn't expect anything like this.
Interesting tricks to optimize this function in V8.
What is premature optimization really? If you look at the full paper it might not be what you think. In any case we get back to: do the math and benchmark.
Nice simple fix in Git but with a large impact on backups. A good proof that profiling and keeping an eye on algorithmic complexity can go a long way in improving software.
Good proposals to shorten the time spent executing tests. Tighter feedback loops make everyone happy.
On the ever expanding domain of applicability for constexpr, more is coming to C++26. This is definitely welcome, should keep making it easier to use.
Nice little comparison of raw loops and ranges in C++. As always, measure before making assumptions... Unsurprisingly it ends up on the usual readability vs performance debate.
Looks like a nice resource to get better at finding the root cause of performance regressions and optimising code.
A good look back at parallelisation and multithreading as a mean to optimise. This is definitely a hard problem, and indeed got a bit easier with recent languages like Rust.
Definitely ugly in the end. Still it does the the trick.
A quick primer about compile time evaluations in Rust.
This is indeed something easy to get wrong. Also this misconception is very widespread, so it's good to debunk it.
More marketing announcement than real research paper. Still it's nice to see smaller models being optimized to run on mobile devices. This will get interesting when it's all local first and coupled to symbolic approaches.
This is too often overlooked, but table lookups can help with performance if done well.
A paper listing patterns to reduce latency as much as possible. There are lesser known tricks in there.
Another good example of how to speed up some Python code with nice gains.
Definitely to keep in mind when using sampling profilers indeed. They're useful, they help to get a starting point in the analysis but they're far from enough to find the root cause of a performance issue.
Excellent work to improve Llama execution speed on CPU. It probably has all the tricks of the trade to accelerate this compute kernel.