For example, projects trying to detect artifacts in data generated by a neural network, using a “simple” algorithm. The same way compression can be seen when data is analyzed. Anything that isn’t “our neural network detects other neural networks” and that isn’t some proprietary bullshit.

Projects trying to block scrapers as best they can or feed them garbage data.

Some collaborative networks for detecting and storing in a database famous data like images or text which has very likely been generated by a neural network. Only if the methods of detection are explained and can be verified of course, otherwise anybody can claim anything.

It would be nice to have a updating pinned post or something with links to research or projects trying to untangle this mess.

The only project I can think of now: https://xeiaso.net/blog/2025/anubis/

    • RawHex@lemmy.mlOP
      link
      fedilink
      arrow-up
      8
      ·
      3 days ago

      Yep that’s nice, although it seems to be proprietary which isn’t ideal, it’s the last thing we need now. Companies exploited the hell out of everything, now companies/universities exploiting the solutions too, when there’s absolutely nothing stopping it from being open.

    • Blue_Morpho@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      3 days ago

      It’s interesting but can’t work outside of a lab. The example they gave was watermarking a picture of a cow with a purse. If every cow picture has a hidden purse watermark, an AI will be trained into categorizing a cow as a purse.

      But to make that work in the real world would require every artist and especially stock photo sites to agree to watermark every cow with a purse. If everyone doesn’t pick a consistent watermark of a purse for a cow, then it becomes noise that is trained out. Just like training a to identify a cow sometimes has a farmhouse in the picture, other times grass, other times birds. It learns cow because that’s the consistent part. Without a purse watermarked into the majority of every cow photo everywhere, the ai will learn cow.

    • RawHex@lemmy.mlOP
      link
      fedilink
      arrow-up
      2
      ·
      2 days ago

      I’m going to link this at the current revision, so that it makes sense in the future: https://en.wikipedia.org/w/index.php?title=Transformer_(deep_learning)&oldid=1333135164

      Read the first line from the link, I’ll add it here if you’re lazy: “In deep learning, the transformer is an artificial neural network…”

      Do you know what “GPT” stands for? “Generative Pre-trained Transformers”

      What were you thinking LLMs use? They’re literally just neural networks stacked as much as possible. That’s why they require all of those data centers, because their only solution to the problem is adding more neural nets and more data which means more hardware, at this point it’s borderline brute forcing. Sure, you can mention the “”“clever”“” tricks they use to “tokenize” words at the beginning, but that’s still a neural net in itself. Don’t get confused by their terminology, every single bit of the “technology” has impressive sounding names until you see how they actually work and smack your forehead so hard it leaves a mark forever.

      • cronenthal@discuss.tchncs.de
        link
        fedilink
        arrow-up
        2
        ·
        2 days ago

        Oh, you’re absolutely right. I didn’t realize that GPTs are of course an ANN variant, I always envisioned them as essentially very large and boring vector databases.

        I might want to rephrase: not all neutral networks are LLMs.

        I personally hate the current “AI” scam with all my heart and I’m so very aware of the extremely limited utility and unsustainable resource demands of the GPT approach. But I have no problem with the more abstract concept of neural networks per se. I expect them to be quite fundamental to any attempt at “real” AI, if we ever get past the current craze.

        • RawHex@lemmy.mlOP
          link
          fedilink
          arrow-up
          2
          ·
          2 days ago

          attempt at “real” AI

          I’m going to argue that there’s no such thing as “real AI”. We are going to create replicas of brains once we understand them fundamentally. I mean to the point we can explain them the same way we know how a CPU architecture works. Right now I think we’re insanely far from that. We barely understand brain diseases or how neurotransmitters work exactly, let alone big structures of neurons.

          My argument is, we don’t even know what “real AI” means, because we don’t know what “I” means yet.

            • RawHex@lemmy.mlOP
              link
              fedilink
              arrow-up
              1
              ·
              2 days ago

              What’s funny about current GPTs is how much manual adjustments they’re doing on them, when the whole idea of making them is so that they “adjust themselves” which of course was total bullshit from the start.

  • Blue_Morpho@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    3 days ago

    My idea would be to implement PGP like encryption into everything. A single user reading a Lemmy thread would require a little extra computation delay. But that computation load would become cost prohibitive for a scraper.

    • RawHex@lemmy.mlOP
      link
      fedilink
      arrow-up
      2
      ·
      3 days ago

      Well that’s what the project I linked does, although I’m not sure it solves all of the issues right now, it’s definitely a start.

    • RawHex@lemmy.mlOP
      link
      fedilink
      arrow-up
      1
      ·
      2 days ago

      Cool, but again, seems proprietary, which is not ideal. Also isn’t it a bit backwards to add artifacts, instead of look for ways to detect artifacts in generated images, so that we catch early and avoid AI in the first place?