• Limonene@lemmy.world
    link
    fedilink
    arrow-up
    32
    arrow-down
    3
    ·
    3 months ago

    Generative AI and their outputs are derived products of their training data. I mean this ethically, not legally; I’m not a copyright lawyer.

    Using the output for personal viewing (advice, science questions, or jacking off to AI porn you requested) is weird but ethical. It’s equivalent to pirating a movie to watch at home.

    But as soon as you show someone else the output, I consider it theft without attribution. If you generate a meme image, you’re failing to attribute the artists whose work trained the AI without permission. If you generate code, that code infringes the numerous open source licenses of the training data, by failing to attribute it.

    Even a simple lemmy text post generated by AI is derived from thousands of unattributed novels.

    • shoo@lemmy.world
      link
      fedilink
      arrow-up
      4
      arrow-down
      3
      ·
      3 months ago

      What a weird distinction. So if I get a prompt to make a particular scene in a particular artist’s distinct style: not stealing. But if I share that prompt (and maybe even some seed info) to a friend, is that stealing? If I take a picture of the generated content, stealing? If someone takes it off my laptop without my knowledge are they stealing from me or the artist?

      My viewpoint is that information wants to be free, and trying to restrict it is a losing battle (as shown by Ai training). The concept of IP is tenuous at best but I do recognize that artists need to eat in our capitalist reality. But once you make something and set it free to the world you inherently lose some ownership of it. Getting mad at the tech itself for the economic injustice is silly, there are plenty more important things to worry about in our hell scape.

      • backgroundcow@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        3 months ago

        Copyright law is more or less always formulated as limits on the rights to redistribute content, not how it is used. Hence, it isn’t a particularly strange position to take that one should be allowed to do whatever one wants with gen AI in the private confines of ones home, and it is only at the moment you start to redistribute content we have to start asking the difficult questions: what is, and what is not, a derivative work of the training data? What ethical limitations, if any, should apply when we use an algorithm to effortlessly copy “a style” that another human has spent lots of effort to develop?

        • shoo@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          3 months ago

          That makes sense wrt redistribution, but the original comment limited itself to the ethical problem and not the legal problem. I just don’t see how it makes sense in that context because it’s entirely unclear who owns the work, that’s the nature of the technology.

          If I train a model on the work of 1000 artists each of them contributes some fractional amount to each weight. When that model generates an image, it’s combining a pseudorandom human token input with the weights and some random seed info.

          If I provide a prompt of my own making, am I stealing 1/1000 of the content from each artist? Is the result 1/3 mine from my token input? Is the result 100% the property of whoever trained the model? Do we need to trace the traversal of the weights and sum the ownership of each artist based on their contribution to that weight? Is it nobody’s due to the sheer number of random steps that convert the input intent to the final result?

    • gmtom@lemmy.world
      link
      fedilink
      arrow-up
      4
      arrow-down
      9
      ·
      3 months ago

      No, gen AI pictures are not dirived works of their training data. They are seperate processes. The algorithm that actually generates the image has no knowledge of the training data.

        • gmtom@lemmy.world
          link
          fedilink
          arrow-up
          3
          arrow-down
          1
          ·
          3 months ago

          The algorithms involved in the actual creation of the images are not the ones actually trained on the data. So its not at all accurate to claim they are derived.

            • gmtom@lemmy.world
              link
              fedilink
              arrow-up
              4
              ·
              3 months ago

              Not directly no.

              The training data trains an algorithm that effectively just describes an image it sees (which BTW is super useful for blind people) and gives a score for each keyword.

              Then the actusl generative part takes a random background, tries to denoise it into somerthing recognisable, then shows it to thr first algorithm that gives it a score on how closely it resembles the prompts. Then does some fancy maths and performs another denoising cycle and gets another score from the first algorithm, more maths, another cycle etc. Until it spits out and image that maches the prompt.

              So the algorithm that genrstes the image has no data from the training process whatsoever.

              • petrol_sniff_king@lemmy.blahaj.zone
                link
                fedilink
                arrow-up
                1
                arrow-down
                1
                ·
                3 months ago

                So the algorithm that genrstes the image has no data from the training process whatsoever.

                It gets a, uh, score. You wrote that yourself, I don’t know how you could forget.

                • gmtom@lemmy.world
                  link
                  fedilink
                  arrow-up
                  2
                  ·
                  3 months ago

                  But thats not the same as a derivative. Like saying a chart on which art styles were most popular in every decade is a derivate of every work in that survey. Because those works were used to create the data being presented.

                  • petrol_sniff_king@lemmy.blahaj.zone
                    link
                    fedilink
                    arrow-up
                    1
                    arrow-down
                    1
                    ·
                    3 months ago

                    But… that is derivative. You can’t know which styles were most popular in a decade without looking at the styles popular in that decade. Such a chart must change if the data it’s built from changes.