• Dr. Moose@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    Lots of misinformation in this thread. Yes they have it, it’s good but it’s probably nowhere close to 99.9% accuracy.

    The primary way to detect AI is to inject a fingerprint into AI generation in the first place. This means only the model creators can do that. We don’t exactly know how the fingerprint works but it can be as simple as preferring 1 word synonym over the other. For example preferring word synonyms like “illustrate”, “peer” etc. quickly ads up to a statistical

    These techniques pre-date chatgpt itself and do work! However there are a lot of caveats:

    • The fingerprint has to be trained for each model meaning each model version performs slightly differently and only owners know the fingerprint.
    • The fingerprint test can only work on longer bodies of text that are not modified further.
    • Extending model through more complex instructions (like character, tone) or RAG can significantly decrease the effectiveness.

    The industry is understandably very secretive about it but your low effort chatgpt copy/paste can be detected by OpenAI and nobody else.

    As for public release of the fingerprint: they can’t as it can be reverse engineered so it’s only valuable as an internal tool for now. Also if released it would serve no real purpose as detection can be easily defeated by remixing content to dilute the fingerprint.

    • EnderMB@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Agreed. Frankly, if someone were to say “we can detect with 99% accuracy” I imagine that someone would say “well, clearly your measurements are wrong, find the issue and come back to us when it’s fixed”.

    • conciselyverbose@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      1 month ago

      but your low effort chatgpt copy/paste can be detected by OpenAI and nobody else

      Low effort copy pastes can absolutely be detected by people who aren’t openAI. The consistent “advanced” vocabulary and excessively formal grammar used correctly, but with clear and significant comprehension gaps are pretty damn consistent. You won’t get perfect reliability, but you’ll catch most of it and you won’t have a huge number of false positives.

      Real people don’t sound like GPT.

      • Dr. Moose@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        1 month ago

        No that’s in no way reliable way of catching anyone and I hope people smarten up and avoid this snake oil entirely. I’m borderline jealous how these “ai catchers” are making so much money from straight up snake oil.

        • conciselyverbose@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          1 month ago

          An algorithm can’t.

          Plenty of humans absolutely can. LLM writing is genuinely fucking terrible. It has the slightly stilted over formality of most non-native speakers, without the intelligence being fluent in a second language implies.

          Flawless grammar with a complete absence of any sign of intelligence is not something you get from humans.

          • Dr. Moose@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            1 month ago

            The “can” is irrelevant here. Checking tool has to be reliable to be useful. What’s the use of having a checker that maybe detects something sometimes somewhat successfully?

            • conciselyverbose@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              0
              ·
              1 month ago

              There’s a massive gap between “you can’t make a tool” and “you can’t identify it”.

              The problem with a tool is the exact same as the issue with LLMs to begin with. It does not resemble intelligence or comprehension in any way and cannot use it as an indicator.

              But the use of LLMs is absolutely identifiable to moderately intelligent humans, because LLM output has raw language skills wildly inconsistent with every other skill that is part of writing.

              • Dr. Moose@lemmy.world
                link
                fedilink
                English
                arrow-up
                0
                ·
                1 month ago

                What’s even point of your argument? That a detective can figure out who used AI? Yes detectives can figure out most stuff. This is completely irrelevant to the topic at hand my dude.

                • conciselyverbose@sh.itjust.works
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  edit-2
                  1 month ago

                  What are you talking about “detectives”?

                  You said “nobody can identify LLM use” when any moderately intelligent human can identify LLM output pretty easily. It explodes off the page.

  • chiisana@lemmy.chiisana.net
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    They’re keeping everything anyway, so what’s preventing them from doing a DB look up to see if it (given a large enough passage of text) exist in their output history?

    • _edge@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      I believe the actual detector is similar. They know what sentences are likely generated by chatgpt, since that’s literally in their model. They probably also have to some degree reverse engineered typical output from competing models.

  • Naich@lemmings.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Total coincidence that this “news” appears about a day after several articles saying the AI bubble is starting to burst.

    • Melvin_Ferd@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      It is nut. Who is paying for all these articles and why are they hell bent on convincing everyone that AI is to the left like immigrants are to Republicans

      • doodledup@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        2 months ago

        Why does everything have to be about the USA these days? I’m tired of this joke of a wannabe democracy. Don’t want to hear it. Nobody cares. Just stop and leave it to yourself.

  • Cyteseer@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    If they aren’t willing to release it, then the situation is no different from them not having one at all. All these claims openai makes about having whatever system but hiding it, is just tobtry and increase hype to grab more investor money.

    • MagicShel@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      The flaw is in the training to make it corporate friendly. Everything it says eventually sounds like a sexual harassment training video, regardless of subject.

    • ArbitraryValue@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      I think the more like explanation is that being able to filter out AI-generated text gives them an advantage over their competitors at obtaining more training data.

  • vrighter@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    it’s only 99.9% accurate because they haven’t released it. As soon as they do, it will quickly fall to 100% as usual. Because this type of thing is exactly what’s needed to develop tech to defeat itself.

  • Etterra@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    If they have one, and that’s IF, then of course they won’t release it. They’re still trying to find a use case for their stupid toy so that they can charge people for it. Releasing the counter agent would be completely contradictory to their business model. It’s like Umbrella Corp. but even dumber.

  • x00z@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    ALL conversations are logged and can be used however they want.

    I’m almost certain this “detector” is a simple lookup in their database.

  • Echo Dot@feddit.uk
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Probably because it doesn’t work. It’s not difficult for Open AI to see if any given conversation is one of their conversations. If I were them I would hash the results of each conversation and then store that hash in a database for quick searching.

    That’s useless for actual AI detection

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    That’s a bad article. What are they reluctant about? Releasing that detector, or applying watermarks to the generated texts? Do they do that already or doesn’t it apply to text generated until then? And how would that affect anything else?

    I mean all the major AI companies promised to do AI ethically. Now they don’t want the one thing that would solve half the issues people are having with that technology. Kind of fits with OpenAI 🤔

    • Dr. Moose@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      They can’t release anything as watermarks can be reverse engineered and people would just wise up and tumble the outputs.

      Weirdly, not releasing this tool publicly might be the smartest bet here as all of these bot farms and idiots just blindly use chatgpt outputs without any tumbling or safety.

      • hendrik@palaver.p3x.de
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        2 months ago

        The issue with that is: Releasing nothing is even worse than releasing something that could be circumvented. I don’t see this as a valid argument.

        I’m not an expert on text watermarking and how that degrades output. But if they want some stealthy solution that isn’t known to the public… Maybe they could attach two watermarks. A simple one that is known to everyone, and an additional, secret one only they know about. It’d be similar to what we do with bank notes. There are some characteristics everyone knows and can use to judge if it’s fake money. And they have some additional secret markings in banknotes that only the central bank knows about.

        I’m pretty sure a similar thing could be done here. Maybe not for a 280 character tweet. But certainly for other use-cases with longer texts. And in case it has a 0% false positive rate, every match helps someone. Even if it’s circumventable. I think even a non-perfect solution that helps several thousands of people is better than helping no-one.

  • superkret@feddit.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    You can just ask ChatGPT if a text was written by it.
    If it is, it’s legally obligated to tell you!