doesn’t it follow that AI-generated CSAM can only be generated if the AI has been trained on CSAM?

This article even explicitely says as much.

My question is: why aren’t OpenAI, Google, Microsoft, Anthropic… sued for possession of CSAM? It’s clearly in their training datasets.

  • Free_Opinions@feddit.uk
    link
    fedilink
    arrow-up
    4
    ·
    17 days ago

    First of all, it’s by definition not CSAM if it’s AI generated. It’s simulated CSAM - no people were harmed doing it. That happened when the training data was created.

    However it’s not necessary that such content even exists in the training data. Just like ChatGPT can generate sentences it has never seen before, image generators can also generate pictures it has not seen before. Ofcourse the results will be more accurate if that’s what it has been trained on but it’s not strictly necessary. It just takes a skilled person to write the prompt.

    My understanding is that the simulated CSAM content you’re talking about has been made by people running their software locally and having provided the training data themselves.