Along with @maciejwolczyk we’ve been training a neural network that learns how to play NetHack, an old roguelike game, that looks like in the screenshot. Recently, something unexpected happened.

  • RecallMadness@lemmy.nz
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    1 month ago

    I was expecting some sort of “Ai discovers new bug in 30 year old software”… cool I’m excited.

    Then they were talking about how the bug was persistent, and I’m more intrigued “is the bug some weird emergent behaviour corrupting state somewhere?”

    Nope, just another example of a shit in shit out data model.

  • There was a speed runner that hit a world record or lost a world record run due to a random bit flip because of space radiation. That’s gotta be worse than just not knowing the deeper mechanics of a very old game.

  • ArbitraryValue@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    Their problem:

    So apparently NetHack has a mechanic that slightly changes how the game plays every time it’s full moon according to your system clock

    The model wasn’t trained on a full moon. They had a system to set up the environment for replicable results but it didn’t include modifying the system time.

  • dohpaz42@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    This is a common problem when testing time-based software. And similarly why it’s difficult to test database-drive software. You have to put a lot of effort into setting up a good environment for testing and genuinely understand the software and its dependencies.

  • slurpyslop@kbin.social
    link
    fedilink
    arrow-up
    0
    ·
    1 month ago

    it was the WEIRDEST bug in our chess ai you guys

    the pawn captured another pawn that was NEXT TO IT

    like what’s going on there

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    1 month ago

    Not my bug and not CS, but I think that the most-difficult bug(s) I’ve read about is the American Mark 14 torpedo in World War II. A combination of constrained budget for testing before the war, extreme inability to meet supply (and thus provide some for testing), difficulties in observing the things in production in operation (it’s a torpedo, and the target probably isn’t too amenable to you looking at the thing if it doesn’t work well), secrecy, cutting-edge technology, and several other problems, a number of modes of operation (including both a contact and proximity magnetic fuze), and including multiple bugs that had a tendency to mask or affect each other, including specifically:

    • A tendency to run deeper than set (and sometimes go too deep and not hit or detect a ship)

    • A tendency to bend a critical pin on impact if the torpedo impacted a ship at something like right angles, but not at an angle; if bent, the torpedo would not detonate.

    • Testing that happened in the Atlantic, but with most use in the Pacific. It turns out that Earth’s magnetic field is not uniform, and varies enough to throw off magnetic fuzes and cause premature explosions or non-explosions.

    …led to the US fighting a war that was heavily-naval, where the main weapon for sinking major ships was the torpedo…but where that torpedo wasn’t really very functional for something like 18 months of fighting.

    Wikipedia has a somewhat longer version.

    This long explanation is probably the best I’ve read.

  • nomad@infosec.pub
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    Reminds me of a production bug we could not replicate for the life of me.

    The condition could logically not be reached. impossible.

    Turns out, in production we had two threads per process, and one would monkey patch a function in the shared process space with a non multi threading safe locking mechanism.

    That took several days to find.