Early days, we'll need to see the pricing and reviews. I'm obviously excited to see KDE going in even more consumer devices by default. Let's hope it sells even better than the Steam Deck.
Looks heavy on the NVidia specifics but it looks like a very comprehensive view of the important concepts in a GPU.
Maybe we have a path forward for performance stackful coroutine? More pieces need to fall in place but this looks promising.
Interesting to see how it behaves in practice when passing parameters by value. Turns out there are surprising patterns in the data.
Nice explanation of the very early steps leading to the kernel loading.
This is indeed an open question. Looks like it has the potential to lead to interesting boards in any case.
Maybe it's time to stop obsessing about scale and distributed architectures? The hardware has been improved quite a bit at the right places, especially storage.
If you're wondering why the architecture is called "amd64" and why the itanium disappeared... this is why. It was a very good stunt from AMD back then.
Very interesting deep dive pointing to a very flawed firmware.
Interesting point, let's not forget those devices indeed don't give us enough access to run whatever operating system you want on them.
Interesting trend in the CPU space. We're getting more simultaneous instructions with the passing generations.
Long but interesting chapter which shows how GPUs architecture works and the differences with TPUs. This is unsurprisingly written in the context of large models training.
OK, this is completely useless but definitely a fun project.
Indeed, CPU prefetchers are really good nowadays. Now you know what to do to keep your code fast.
A good reminder that "push it to the GPU and it'll be faster" isn't true. If you move a workload to the GPU you likely have to rethink quite a bit how it's done.
Definitely a cool hardware hack. There are really many form factors and hardware options to explore for better XR experience.
The hardware is there, the software not so much. Now I'd argue that the author overestimate the availability of said hardware in households.
Nice trick for highly performance sensitive data structures. Making data CPU local instead of thread local you can make a mechanism which is especially cache friendly.
Nice post explaining the need of ACPI or Device Tree and how they are leveraged by kernels.
It looks like analog chips for neural network workloads are on the verge of finally becoming reality. This would reduce consumption by an order of magnitude and hopefully more later on. Very early days for this new attempt, let's see if it holds its promises.