• fubo@lemmy.worldOP
    link
    fedilink
    arrow-up
    0
    ·
    5 months ago

    All true! And if you want the service to be up 99.99% of the time, you can’t rely on waking someone up to fix it.

    • vrek@programming.dev
      link
      fedilink
      English
      arrow-up
      0
      ·
      5 months ago

      It might be a side effect of my work environment. I make the equipment that tests electronic medical implants. Theoretically if a unit put 1A of charge out instead of 1ma that could kill a person. Now on a practical level that’s not possible with our devices and even if it was we should be able to identify and prevent that unit from reaching the field.

      Yes you are right, you want 99.99% uptime you need this stuff. In the field I’m in a single case escaping test can be months of engineering time to investigate, root cause analysis to determine the actual cause, expensive fixes for the short term and even more expensive fixes in long term to upgrade everything so it never happens again.

      Boss being unhappy that you missed something is minor. Their boss’s boss’s boss is the real issue. That said we get regularly audited both in-house and external agencies so it’s unlikely. Multiple lines of defense, have a computer check it, have a person check that the computer actually checked it, have a computer verify that the person actually verified it. Have each of those systems regularly audited and verified to be effective.

      It’s expensive but it is what is needed to be in this field.