• panda_abyss@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    ·
    2 months ago

    I’ve been looking forward to trying this one.

    I have some use cases where I need to do some large scale data cleanup, but using an LLM is overkill and I already get good results with smaller embeddings.

    I want to try using this model and taking advantage of the Matryoshka dimension reduction to manage the progressively more complex use cases.

    • mierdabird@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 months ago

      Not really sure I understand how these work, do you just feed it a large textual document like a transcript or something, and it turns it into a more machine readable vector format or something?

      Or is it just a much smaller LLM that’s more optimized for reading than generating?

      • panda_abyss@lemmy.ca
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 months ago

        Basically yes

        I’ve only built my own systems that use sentence transformers

        You pass in a list of strings, it generates a list of vectors, those vectors can be used for all sorts of similarity analysis and retrieval.