• tourist@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    The participants judged GPT-4 to be human a shocking 54 percent of the time.

    ELIZA, which was pre-programmed with responses and didn’t have an LLM to power it, was judged to be human just 22 percent of the time

    Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

    Try talking to the thing: https://web.njit.edu/~ronkowit/eliza.html

    I refuse to believe that 22% didn’t misunderstand the task or something.

    • Downcount@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Okay, 22% is ridiculously high for ELIZA. I feel like any half sober adult could clock it as a bot by the third response, if not immediately.

      I did some stuff with Eliza back then. One time I set up an Eliza database full of insults and hooked it up to my AIM account.

      It went so well, I had to apologize to a lot of people who thought I was drunken or went crazy.

      Eliza wasn’t thaaaaat bad.

    • webghost0101@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      4 months ago

      The public versions of the ais used in Turing tests usually have less computing power. The test itself is often also highly specific in what and how questions can be asked.

      This hardly news because models have passed the test before and as a result the test is made more difficult. It says nothing about intelligence and only about the ability to convincingly simulate a human conversation.

    • technocrit@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      It was a 5 minute test. People probably spent 4 of those minutes typing their questions.

      This is pure pseudo-science.

    • catastrophicblues@lemmy.ca
      link
      fedilink
      English
      arrow-up
      0
      ·
      3 months ago
      > Hello, I am Eliza. I'll be your therapist today. 
      * Hello! How are you?
      > Would you prefer if I were not ? 
      * Not what?
      > Have you asked such questions before? 
      * No
      > You are being a bit negative and that won't help us move forward. 
      * What are you smoking?
      > Perhaps in your fantasies I am smoking?
      

      Yeah, it took me one message lol

    • webghost0101@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      4 months ago

      To clarify:

      People seem to legit think the jury talks to the bot in real time and can ask about literally whatever they want.

      Its rather insulting to the scientist that put a lot of thought into organizing a controlled environment to properly test defined criteria.

      • technocrit@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        4 months ago

        Its rather insulting to the scientist that put a lot of thought into organizing a controlled environment to properly test defined criteria.

        lmao. These “scientists” are frauds. 500 people is not a legit sample site. 5 minutes is a pathetic amount of time. 54% is basically the same as guessing. And most importantly the “Turing Test” is not a scientific test that can be “passed” with one weak study.

        Instead of bootlicking “scientists”, we should be harshly criticizing the overwhelming tide of bad science and pseudo-science.

        • Kogasa@programming.dev
          link
          fedilink
          English
          arrow-up
          0
          ·
          4 months ago

          I don’t think the methodology is the issue with this one. 500 people can absolutely be a legitimate sample size. Under basic assumptions about the sample being representative and the effect size being sufficiently large you do not need more than a couple hundred participants to make statistically significant observations. 54% being close to 50% doesn’t mean the result is inconclusive. With an ideal sample it means people couldn’t reliably differentiate the human from the bot, which is presumably what the researchers believed is of interest.

        • webghost0101@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          0
          ·
          4 months ago

          The reporting are big clickbait but that doesn’t mean there is nothing left to learn from the old touring tests.

          I dont know what the goal was they had in mind. It could just as well be “testing how overhyped the touring tests is when manipulated tests are shared with the media”

          I sincerely doubt it but i do give them benefits of the doubt.

  • foggy@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    Meanwhile, me:

    (Begin)

    [Prints error statement showing how I navigated to a dir, checked to see a files permissions, ran whoami, triggered the error]

    Chatgpt4: First, make sure you’ve navigated to the correct directory.

    cd /path/to/file

    Next, check the permissions of the file

    ls -la

    Finally, run the command

    [exact command I ran to trigger the error]>

    Me: stop telling me to do stuff that I have evidently done. My prompt included evidence of me having do e all of that already. How do I handle this error?

    (return (begin))

    • massive_bereavement@fedia.io
      link
      fedilink
      arrow-up
      0
      ·
      4 months ago

      The interrogators seem completely lost and clearly haven’t talk with an NLP chatbot before.

      That said, this gives me the feeling that eventually they could use it to run scams (or more effective robocalls).

    • harrys_balzac@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Skynet will gets the dumb ones first by getting them put toxic glue on thir pizzas then the arrogant ones will build the Terminators by using reverse psychology.

  • dhork@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    4 months ago

    In order for an AI to pass the Turing test, it must be able to talk to someone and fool them into thinking that they are talking to a human.

    So, passing the Turing Test either means the AI are getting smarter, or that humans are getting dumber.

  • dustyData@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    4 months ago

    Turing test isn’t actually meant to be a scientific or accurate test. It was proposed as a mental exercise to demonstrate a philosophical argument. Mainly the support for machine input-output paradigm and the blackbox construct. It wasn’t meant to say anything about humans either. To make this kind of experiments without any sort of self-awareness is just proof that epistemology is a weak topic in computer science academy.

    Specially when, from psychology, we know that there’s so much more complexity riding on such tests. Just to name one example, we know expectations alter perception. A Turing test suffers from a loaded question problem. If you prompt a person telling them they’ll talk with a human, with a computer program or announce before hand they’ll have to decide whether they’re talking with a human or not, and all possible combinations, you’ll get different results each time.

    Also, this is not the first chatbot to pass the Turing test. Technically speaking, if only one human is fooled by a chatbot to think they’re talking with a person, then they passed the Turing test. That is the extend to which the argument was originally elaborated. Anything beyond is alterations added to the central argument by the author’s self interests. But this is OpenAI, they’re all about marketing aeh fuck all about the science.

    • Kogasa@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      4 months ago

      Your first two paragraphs seem to rail against a philosophical conclusion made by the authors by virtue of carrying out the Turing test. Something like “this is evidence of machine consciousness” for example. I don’t really get the impression that any such claim was made, or that more education in epistemology would have changed anything.

      In a world where GPT4 exists, the question of whether one person can be fooled by one chatbot in one conversation is long since uninteresting. The question of whether specific models can achieve statistically significant success is maybe a bit more compelling, not because it’s some kind of breakthrough but because it makes a generalized claim.

      Re: your edit, Turing explicitly puts forth the imitation game scenario as a practicable proxy for the question of machine intelligence, “can machines think?”. He directly argues that this scenario is indeed a reasonable proxy for that question. His argument, as he admits, is not a strongly held conviction or rigorous argument, but “recitations tending to produce belief,” insofar as they are hard to rebut, or their rebuttals tend to be flawed. The whole paper was to poke at the apparent differences between (a futuristic) machine intelligence and human intelligence. In this way, the Turing test is indeed a measure of intelligence. It’s not to say that a machine passing the test is somehow in possession of a human-like mind or has reached a significant milestone of intelligence.

      https://academic.oup.com/mind/article/LIX/236/433/986238

      • dustyData@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        4 months ago

        Turing never said anything of the sort, “this is a test for intelligence”. Intelligence and thinking are not the same. Humans have plenty of unintelligent behaviors, that has no bearing on their ability to think. And plenty of animals display intelligent behavior but that is not evidence of their ability to think. Really, if you know nothing about epistemology, just shut up, nobody likes your stupid LLMs and the marketing is tiring already, and the copyright infringement and rampant privacy violations and property theft and insatiable power hunger are not worthy.

  • technocrit@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    4 months ago
    • 500 people - meaningless sample
    • 5 minutes - meaningless amount of time
    • The people bootlicking “scientists” obviously don’t understand science.