Excellent work to improve Llama execution speed on CPU. It probably has all the tricks of the trade to accelerate this compute kernel.
With some tuning SQLite can go a long way, even for server type workloads. There are still a few caveats but in some case this can reduce complexity and cost quite a bit.
Nice balanced view on some of Rust characteristics. This is much less naive than some of the "Rust is great" posts out there.
As usual measure and don't just assume when you want to optimize something. This is an interesting case in Python using Numba.
Indeed this. It's not only about payload size, it's also about CPU consumption. Our profession is still assuming too much that users will get faster CPU on a regular basis.
This is nice to see a new benchmark being published. This seems to follow real life scenarios. We can expect browser engines performance to increase.
A response to "The Hunt for the Missing Data Type" article. There are indeed potential solutions, but they're not really used/usable in the industry right now. Maybe tomorrow.
Indeed, graphs are peculiar beasts. When dealing with graph related problems there are so many choices to make that it's hard or impossible to come up with a generic solution.
Interesting take even though I'm not sure I buy it completely. This is an interesting pledge for aiming at power efficiency and squeezing performance out of software.
Interesting library if you got to do a lots of heavy analysis work with strings.
This is indeed an odd situation... there is no good explanation about why this is like this.
Nice exploration of the GitLab database schema. This highlights and finds quite a few of the choices made with an eye on performances.
Very interesting approach to JSON parsing. Comes with a very thorough performance analysis.
Not necessarily a practical advice in most of our daily code. Still this exhibits interesting low level details about argument passing. Might come in handy in a few cases, to be kept in mind.
A very precise and thorough article about GPU occupancy. What it is, how it relates to perceived performances, it's potentisl relationship with cache thrashing and the tools you can use to measure it on AMD GPUs.
Very nice collection of stories from the trenches of Firefox development. Lots of lessons learned to unpack about optimizing for the right thing, tooling, telemetry and so on.
This is unsurprisingly highly depend on the actual code, not only on the hardware.
Seen this a bit too often indeed. When people learn about std::move they tend to sprinkle it too much preventing proper optimizations. Its use should be fairly limited usually.
Good reminder that false sharing is a real thing. It's easier to encounter than you think when you start to dabble into multi-threading.
This is indeed a nice improvement. I hope they keep working in this direction.