• Max-P@lemmy.max-p.me
    link
    fedilink
    arrow-up
    10
    ·
    9 months ago

    No but if they forget to strip those before training the models, it’s gonna start spitting out licenses everywhere, making it annoying for AI companies.

    It’s so easily fixed with a simple regex though, it’s not that useful. But poisoning the data is theoretically possible.

    • t3rmit3@beehaw.org
      link
      fedilink
      arrow-up
      1
      ·
      9 months ago

      Only if enough people were doing this to constitute an algorithmically-reducible behavior.

      If you could get everyone who mentions a specific word or subject to put a CC license in their comment, then an ML model trained on those comments would likely output the license name when that subject was mentioned, but they don’t just randomly insert strings they’ve seen, without context.