• 0 Posts
  • 18 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle

  • Humans are not generally allowed to do what AI is doing! You talk about copying someone else’s “style” because you know that “style” is not protected by copyright, but that is a false equivalence. An AI is not copying “style”, but rather every discernible pattern of its input. It is just as likely to copy Walt Disney’s drawing style as it is to copy the design of Mickey Mouse. We’ve seen countless examples of AI’s copying characters, verbatim passages of texts and snippets of code. Imagine if a person copied Mickey Mouse’s character design and they got sued for copyright infringement. Then they go to court and their defense was that they downloaded copies of the original works without permission and studied them for the sole purpose of imitating them. They would be admitting that every perceived similarity is intentional. Do you think they would not be found guilty of copyright infringement? And AI is this example taken to the extreme. It’s not just creating something similar, it is by design trying to maximize the similarity of its output to its training data. It is being the least creative that is mathematically possible. The AI’s only trick is that it threw so many stuff into its mixer of training data that you can’t generally trace the output to a specific input. But the math is clear. And while its obvious that no sane person will use a copy of Mickey Mouse just because an AI produced it, the same cannot be said for characters of lesser known works, passages from obscure books, and code snippets from small free software projects.

    In addition to the above, we allow humans to engage in potentially harmful behavior for various reasons that do not apply to AIs.

    • “Innocent until proven guilty” is fundamental to our justice systems. The same does not apply to inanimate objects. Eg a firearm is restricted because of the danger it poses even if it has not been used to shoot someone. A person is only liable for the damage they have caused, never their potential to cause it.
    • We care about peoples’ well-being. We would not ban people from enjoying art just because they might copy it because that would be sacrificing too much. However, no harm is done to an AI when it is prevented from being trained, because an AI is not a person with feelings.
    • Human behavior is complex and hard to control. A person might unintentionally copy protected elements of works when being influenced by them, but that’s hard to tell in most cases. An AI has the sole purpose of copying patterns with no other input.

    For all of the above reasons, we choose to err on the side of caution when restricting human behavior, but we have no reason to do the same for AIs, or anything inanimate.

    In summary, we do not allow humans to do what AIs are doing now and even if we did, that would not be a good argument against AI regulation.





  • lsblk is just lacking a lot of information and creating a false impression of what is happening. I did a bind mount to try it out.

    sudo mount -o ro --bind /var/log /mnt
    

    This mounts /var/log to /mnt without making any other changes. My root partition is still mounted at / and fully functional. However, all that lsblk shows under MOUNTPOINTS is /mnt. There is no indication that it’s just /var/log that is mounted and not the entire root partition. There is also no mention at all of /. findmnt shows this correctly. Omitting all irrelevant info, I get:

    TARGET                                                SOURCE                 [...]
    /                                                     /dev/dm-0              [...]
    [...]
    └─/mnt                                                /dev/dm-0[/var/log]    [...]
    

    Here you can see that the same device is used for both mountpoints and that it’s just /var/log that is mounted at /mnt.

    Snap is probably doing something similar. It is mounting a specific directory into the directory of the firefox snap. It is not using your entire root partition and it’s not doing something that would break the / mountpoint. This by itself should cause no issues at all. You can see in the issue you linked as well that the fix to their boot issue was something completely irrelevant.



  • If you have a large enough bank roll and continuously double your bet after a loss, you can never lose without a table limit.

    Unless your bank roll is infinite, you always lose in the average case. My math was just an example to show the point with concrete numbers.

    In truth it is trivial to prove that there is no winning strategy in roulette. If a strategy is just a series of bets, then the expected value is the sum of the expected value of the bets. Every bet in roulette has a negative expected value. Therefore, every strategy has a negative expected value as well. I’m not saying anything ground-breaking, you can read a better write-up of this idea in the wikipedia article.

    If you don’t think that’s true, you are welcome to show your math which proves a positive expected value. Otherwise, saying I’m “completely wrong” means nothing.


  • So help me out here, what am I missing?

    You’re forgetting that not all outcomes are equal. You’re just comparing the probability of winning vs the probability of losing. But when you lose you lose much bigger. If you calculate the expected outcome you will find that it is negative by design. Intuitively, that means that if you do this strategy, the one time you will lose will cost you more than the money you made all the other times where you won.

    I’ll give you a short example so that we can calculate the probabilities relatively easily. We make the following assumptions:

    • You have $13, which means you can only make 3 bets: $1, $3, $9
    • The roulette has a single 0. This is the best case scenario. So there are 37 numbers and only 18 of them are red This gives red a 18/37 to win. The zero is why the math always works out in the casino’s favor
    • You will play until you win once or until you lose all your money.

    So how do we calculate the expected outcome? These outcomes are mutually exclusive, so if we can define the (expected gain * probability) of each one, we can sum them together. So let’s see what the outcomes are:

    • You win on the first bet. Gain: $1. Probability: 18/37.
    • You win on the second bet. Gain: $2. Probability: 19/37 * 18/37 (lose once, then win once).
    • You win on the third bet. Gain: $4. Probability: (19/37) ^ 2 * 18/37 (lose twice, then win once).
    • You lose all three bets. Gain: -$13. Probability: (19/37) ^ 3 (lose three times).

    So the expected outcome for you is:

    $1 * (18/37) + 2 * (19/37 * 18/37) + … = -$0.1328…

    So you lose a bit more than $0.13 on average. Notice how the probabilities of winning $1 or $2 are much higher than the probability of losing $13, but the amount you lose is much bigger.

    Others have mentioned betting limits as a reason you can’t do this. That’s wrong. There is no winning strategy. The casino always wins given enough bets. Betting limits just keep the short-term losses under control, making the business more predictable.


  • Im not 100% comfortable with AI gfs and the direction society could potentially be heading. I don’t like that some people have given up on human interaction and the struggle for companionship, and feel the need to resort to a poor artificial substitute for genuine connection.

    That’s not even the scary part. What we really shouldn’t be uncomfortable with is this very closed technology having so much power over people. There’s going to be a handful of gargantuan immoral companies controlling a service that the most emotionally vulnerable people will become addicted to.



  • Exactly this. I can’t believe how many comments I’ve read accusing the AI critics of holding back progress with regressive copyright ideas. No, the regressive ideas are already there, codified as law, holding the rest of us back. Holding AI companies accountable for their copyright violations will force them to either push to reform the copyright system completely, or to change their practices for the better (free software, free datasets, non-commercial uses, real non-profit orgs for the advancement of the technology). Either way we have a lot to gain by forcing them to improve the situation. Giving AI companies a free pass on the copyright system will waste what is probably the best opportunity we have ever had to improve the copyright system.


  • LLMs can do far more

    What does this mean? I don’t care what you (claim) your model “could” do, or what LLMs in general could do. What we’ve got are services trained on images that make images, services trained on code that write code etc. If AI companies want me to judge the AI as if that is the product, then let them give us all equal and unrestricted access to it. Then maybe I would entertain the “transformative use” argument. But what we actually get are very narrow services, where the AI just happens to be a tool used in the backend and not part of the end product the user receives.

    Can it write stories in the style of GRRM?

    Talking about “style” is misleading because “style” cannot be copyrighted. It’s probably impractical to even define “style” in a legal context. But an LLM doesn’t copy styles, it copies patterns, whatever they happen to be. Some patterns are copyrightable, eg a character name and description. And it’s not obvious what is ok to copy and what isn’t. Is a character’s action copyrightable? It depends, is the action opening a door or is it throwing a magical ring into a volcano? If you tell a human to do something in the style of GRRM, they would try to match the medieval fantasy setting and the mood, but they would know to make their own characters and story arcs. The LLM will parrot anything with no distinction.

    Any writer claiming to be so unique that they aren’t borrowing from other writers is full of shit.

    This is a false equivalence between how an LLM works and how a person works. The core ideas expressed here is that we should treat products and humans equivalently, and that how an LLM functions is basically how humans think. Both of these are objectively wrong.

    For one, humans are living beings with feelings. The entire point of our legal system is to protect our rights. When we restrict human behavior it is justified because it protects others; at least that’s the formal reasoning. We (mostly) judge people based on what they’ve done and not what we know they could do. This is not how we treat products and that makes sense. We regulate weapons because they could kill someone, but we only punish a person after they have committed a crime. Similarly a technology designed to copy can be regulated, whereas a person copying someone else’s works could be (and often is) punished for it after it is proven that they did it. Even if you think that products and humans should be treated equally, it is a fact that our justice system doesn’t work that way.

    People also have many more functions and goals than an LLM. At this point it is important to remember that an LLM does literally one thing: for every word it writes it chooses the one that would “most likely” appear next based on its training data. I put “most likely” in quotes because it sounds like a form of prediction, but actually it is based on the occurrences of words in the training data only. It has nothing else to incorporate to its output, and it has no other need. It doesn’t have ideas or a need to express them. An LLM can’t build upon or meaningfully transform the works it copies, it’s only trick is mixing together enough data to make it hard for you to determine the sources. That can make it sometimes look original but the math is clear, it is always trying to maximize the similarity to the training data, if you consider choosing the “most likely” word at every step to be a metric of similarity. Humans are generally not trying to maximize their works’ similarity to other peoples’ works. So when a creator is inspired by another creator’s work, we don’t automatically treat that as an infringement.

    But even though comparing human behavior to LLM behavior is wrong, I’ll give you an example to consider. Imagine that you write a story “in the style of GRRM”. GRRM reads this and thinks that some of the similarities are a violation of his copyright so he sues you. So far it hasn’t been determined that you’ve done something wrong. But you go to court and say the following:

    • You pirated the entirety of GRRM’s works.
    • You studied them only to gain the ability to replicate patterns in your own work. You have no other user for them, not even personal satisfaction gained from reading them.
    • You clarify that replicating the patterns is achieved by literally choosing your every word to be the one that you determined GRRM would most likely use next.
    • And just to be clear you don’t who GRRM is or what he talks like. Your understanding of what word he would most likely use is based solely on the pirated works.
    • You had no original input of your own.

    How do you think the courts would view any similarities between your works? You basically confessed that anything that looks like a copy is definitely a copy. Are these characters with similar names and descriptions to GRRM’s characters just a coincidence? Of course not, you just explained that you chose those names specifically because they appear in GRRM’s works.


  • Already seeing people come in to defend these suits. I just see it like this: AI is a tool, much like a computer or a pencil are tools. You can use a computer to copyright infringe all day, just like a pencil can. To me, an AI is only going to be plagiarizing or infringing if you tell it to. How often does AI plagiarize without a user purposefully trying to get it to do so? That’s a genuine question.

    You are misrepresenting the issue. The issue here is not if a tool just happens to be able to be used for copyright infringement in the hands of a malicious entity. The issue here is whether LLM outputs are just derivative works of their training data. This is something you cannot compare to tools like pencils and pcs which are much more general purpose and which are not built on stole copyright works. Notice also how AI companies bring up “fair use” in their arguments. This means that they are not arguing that they are not using copryighted works without permission nor that the output of the LLM does not contain any copyrighted part of its training data (they can’t do that because you can’t trace the flow of data through an LLM), but rather that their use of the works is novel enough to be an exception. And that is a really shaky argument when their services are actually not novel at all. In fact they are designing services that are as close as possible to the services provided by the original work creators.





  • prageru is a known disinformation platform. That link is worthless.

    The ongoing war in Gaza, is HAMAS against Israel.

    And what about the Palestinian lands that are occupied and the Palestinians that were uprooted from there? What about the Palestinians that have been killed by Israel? The recent events might have been HAMAS, but historically this is a Palestine-Israel conflict. If you can’t be bothered to learn and understand the context, why comment at all?