• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    edit-2
    4 days ago

    With sparse attention, very interesting. It seems GQA is a thing of the past.

    I especially love Deepseek’s ‘public research’ aspect: they trained this and Terminus the same way, so the attention schemes are (more-or-less) directly comparable. That’s awesome.

    GLM 4.6 is reportedly about to drop too. Which is great, as 4.5 is without a doubt my daily driver now.