Who reads this anyway? Nobody, that’s who. I could write just about anything here, and it wouldn’t make a difference. As a matter of fact, I’m kinda curious to find out how much text can you dump in here. If you’re like really verbose, you could go on and on about any pointless…[no more than this]

  • 2 Posts
  • 369 Comments
Joined 1 year ago
cake
Cake day: June 5th, 2023

help-circle











  • Statistical tests are very picky. They have been designed by mathematicians in a mathematical ideal vacuum void of all reality. The method works in those ideal conditions, but when you take that method and apply it in messy reality where everything is flawed, you may run into some trouble. In simple cases, it’s easy to abide by the assumptions of the statistical test, but as your experiment gets more and more complicated, there are more and more potholes for you to dodge. Best case scenario is, your messy data is just barely clean enough that you can be reasonably sure the statistical test still works well enough and you can sort of trust the result up to a certain point.

    However, when you know for a fact that some of the underlying assumptions of the statistical test are clearly being violated, all bets are off. Sure, you get a result, but who in their right mind would ever trust that result?

    If the test says that the medicine is works, there’s clearly financial incentive to believe it and start selling those pills. If it says that the medicine is no better than placebo, there’s similar incentive to reject the test result and demand more experiments. Most of that debate goes out the window if you can be reasonably sure that the data is good enough and the result of your statistical test is reliable enough.


  • Yeah, that’s the thing with placebo. It’s surprisingly effective, and separating the psychological effect from actual chemistry can be very tricky. If most participants can correctly identify if they’re bing fed the real drug or a placebo, it makes it impossible to figure out how much each effect contributes to the end result. Ideally, you would only use effective medicine that does not need the placebo effect to actually work.

    Imagine, if all medicine had lots of placebo effect in them. How would you treat patients who are in a coma or otherwise unconscious?




  • I think of an LLM as a tool, just like drill or a hammer. If you buy or rent these tools, you pay the tool company. If you use the tools to build something, your client pays you for that work.

    Similarly, OpenAI can charge me for extensive use of ChatGPT. I can use that tool to write a book, but it’s not 100% AI work. I need to spend several hours prompt crafting, structuring, reading and editing the book in order to make something acceptable. I don’t really act as a writer in this workflow, but more like an editor or a publisher. When I publish and sell my book, I’m entitled to some compensation for the time and effort that I put into it. Does that sound fair to you?





  • Here’s an analogy that can be used to test this idea.

    Let’s say I want to write a book but I totally suck as an author and I have no idea how to write a good one. To get some guidelines and inspiration, I go to the library and read a bunch of books. Then, I’ll take those ideas and smash them together to produce a mediocre book that anyone would refuse to publish. Anyway, I could also buy those books, but the end result would still be the same, except that it would cost me a lot more. Either way, this sort of learning and writing procedure is entirely legal, and people have been doing this for ages. Even if my book looks and feels a lot like LOTR, it probably won’t be that easy to sue me unless I copy large parts of it word for word. Blatant plagiarism might result in a lawsuit, but I guess this isn’t what the AI training data debate is all about, now is it?

    However, if I pirated those books, that could result in some trouble. However, someone would need to read my miserable book, find a suspicious passage, check my personal bookshelf and everything I have ever borrowed etc. That way, it might be possible to prove that I could not have come up with a specific line of text except by pirating some book. If an AI is trained on pirated data, that’s obviously something worth the debate.