• Halcyon@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    0
    ·
    20 days ago

    They are large LANGUAGE models. It’s no surprise that they can’t solve those mathematical problems in the study. They are trained for text production. We already knew that they were no good in counting things.

      • zbyte64@awful.systems
        link
        fedilink
        English
        arrow-up
        0
        ·
        20 days ago

        That’s not how you sell fish though. You gotta emphasize how at one time we were all basically fish and if you buy my fish for long enough, those fish will eventually evolve hands to climb!

  • Lvxferre@mander.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    21 days ago

    The fun part isn’t even what Apple said - that the emperor is naked - but why it’s doing it. It’s nice bullet against all four of its GAFAM competitors.

    • conciselyverbose@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      21 days ago

      They’re a publicly traded company.

      Their executives need something to point to to be able to push back against pressure to jump on the trend.

    • jherazob@fedia.io
      link
      fedilink
      arrow-up
      0
      ·
      21 days ago

      This right here, this isn’t conscientious analysis of tech and intellectual honesty or whatever, it’s a calculated shot at it’s competitors who are desperately trying to prevent the generative AI market house of cards from falling

    • WhatAmLemmy@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      21 days ago

      The results of this new GSM-Symbolic paper aren’t completely new in the world of AI researchOther recent papers have similarly suggested that LLMs don’t actually perform formal reasoning and instead mimic it with probabilistic pattern-matching of the closest similar data seen in their vast training sets.

      WTF kind of reporting is this, though? None of this is recent or new at all, like in the slightest. I am shit at math, but have a high level understanding of statistical modeling concepts mostly as of a decade ago, and even I knew this. I recall a stats PHD describing models as “stochastic parrots”; nothing more than probabilistic mimicry. It was obviously no different the instant LLM’s came on the scene. If only tech journalists bothered to do a superficial amount of research, instead of being spoon fed spin from tech bros with a profit motive…

      • aesthelete@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        21 days ago

        If only tech journalists bothered to do a superficial amount of research, instead of being spoon fed spin from tech bros with a profit motive…

        This is outrageous! I mean the pure gall of suggesting journalists should be something other than part of a human centipede!

    • misk@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      21 days ago

      Given the use cases they were benchmarking I would be very surprised if they were any better.

  • emerald@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    0
    ·
    21 days ago

    statistical engine suggesting words that sound like they’d probably be correct is bad at reasoning

    How can this be??

  • kingthrillgore@lemmy.ml
    link
    fedilink
    English
    arrow-up
    0
    ·
    21 days ago

    I feel like a draft landed on Tim’s desk a few weeks ago, explains why they suddenly pulled back on OpenAI funding.

  • N0body@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    21 days ago

    The tested LLMs fared much worse, though, when the Apple researchers modified the GSM-Symbolic benchmark by adding “seemingly relevant but ultimately inconsequential statements” to the questions

    Good thing they’re being trained on random posts and comments on the internet, which are known for being succinct and accurate.

  • whotookkarl@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    21 days ago

    Here’s the cycle we’ve gone through multiple times and are currently in:

    AI winter (low research funding) -> incremental scientific advancement -> breakthrough for new capabilities from multiple incremental advancements to the scientific models over time building on each other (expert systems, LLMs, neutral networks, etc) -> engineering creates new tech products/frameworks/services based on new science -> hype for new tech creates sales and economic activity, research funding, subsidies etc -> (for LLMs we’re here) people become familiar with new tech capabilities and limitations through use -> hype spending bubble bursts when overspend doesn’t keep up with infinite money line goes up or new research breakthroughs -> AI winter -> etc…