I've just spent a week working with a brilliant contributor, leeeanh, who helped me significantly optimize the renderer's code. It turns out this mysterious contributor is an Assistant Professor of Computer Science at the University of Tokyo, and we were fortunate that he stopped by our project through the serendipity of open-source collaboration.
His method was simple: identify unnecessary loops in the code, or those that could be delegated to hardware (AVX-2 and AVX-512 optimizations), then simplify them or optimize their execution. By applying his approach, I was able to eliminate 176,000 conditional tests per second in DSD512 playback alone. And you can hear the difference.
The results are remarkable, as Dominique and Pierre-Marin have noted.
The lesson I take from this experience is that code simplicity and optimization bring transparency, dynamics, and musicality. Conversely, stacking software layers increases noise and produces the opposite effect.