With sparse attention, very interesting. It seems GQA is a thing of the past.
I especially love Deepseek’s ‘public research’ aspect: they trained this and Terminus the same way, so the attention schemes are (more-or-less) directly comparable. That’s awesome.
GLM 4.6 is reportedly about to drop too. Which is great, as 4.5 is without a doubt my daily driver now.
With sparse attention, very interesting. It seems GQA is a thing of the past.
I especially love Deepseek’s ‘public research’ aspect: they trained this and Terminus the same way, so the attention schemes are (more-or-less) directly comparable. That’s awesome.
GLM 4.6 is reportedly about to drop too. Which is great, as 4.5 is without a doubt my daily driver now.