AI Loophole #1; Your GitHub README.md

Elias Griffin@lemmy.world · edit-2 5 months ago

AI Loophole #1; Your GitHub README.md

Elias Griffin@lemmy.world · edit-2 5 months ago

Thanks for all the comments affirming my hard working planned 6 month AI honeypot endeavouring to be a threat to anything that even remotely has the possibility of becoming anti-human. It was in my capability and interest to do, so I did it. This phase may pass and we won’t have to worry, but we aren’t there yet, I believe.

I did some more digging in Perplexity on niche security but this is tangential and speculative un-like my previous evidenced analysis, but I do think I’m on to something and maybe others can help me crack it.

I wrote this nice article https://www.quadhelion.engineering/articles/freebsd-synfin.html about FreeBSD syscontrols tunables, dropping SYN FIN, and it’s performance impact on webhosting and security, so I searched for that. There are many conf files out there containing this directive and performance in aggregate but I couldn’t find any specific data on a controled test of just that tunable, so I tested it months ago.

Searched for it Perplexity:

It gave me a contradictorily worded and badly explained answer with the correct conclusion as from two different people
None of the sources it claimed said anything* about it’s performance trade-off
The answers change daily
One answer one day gave an identical fork of a gist with the authors name in comments in the second line. I went on GitHub and notified the original author. https://gist.github.com/clemensg/8828061?permalink_comment_id=5090233#gistcomment-5090233 Then I went to go back and take a screenshot I would say, maybe 5-10 minutes later and I could not recreate that gist as a source anymore. I figured it would be consistent so I didn’t need to take a screenshot right then!

The forked gist was: https://gist.github.com/gspu/ac748b77fa3c001ef3791478815f7b6a

[Contradiction over time] The impact was none, negligible, trivial, improve

[Errors] Corrected after yesterday, and in following with my comments on the web that it actually improves performance as in my months old article It is not minimal or trivial, it’s a huge decision that has definiate and measurable impact on todays web stacks. This is an obvious duh moment once you realize you are changing the TCP stacks and that is hardly ever negligible.

drop synfin is mainly mitigating fingerprinting, not DOS/DDoS, that’s a SYN flood it’s meaning.

Anyone feel like an experiment here in this thread and ask ChatGPT the same question for me/us?

Elias Griffin@lemmy.world · edit-2 5 months ago

The comments so far aren’t real people posting how they really feel. An agenda or automata. Does that tell you I’m over the target or what?

Look my post is doing really well on the cyberescurity exchanges. So to all real developers and program managers out there:

Recommend the removal of any “primary logic” functional code examples out of your `README.md`, that’s it.

PSA, Here to help, Elias

Elias Griffin@lemmy.world · 5 months ago

I also just realized why I’m getting heat here, lawsuits.

I just gave legal cause that practice was not properly disclosed by Microsoft, abused by OpenAI, a legal grounds as a README.markdown containg code as being software, not speech, integral to licensed software, which is covered by said license.

If an entity does find out like me your technical writing or code is in AI from a README, they are perhaps liable?

AlexanderESmith@social.alexanderesmith.com · 5 months ago

Eh. This is not a new argument, and not the first evidence of it. I don’t think you’re gonna be high on their list of retaliation targets, if you register at all (to say nothing of the low-to-middling reach of the fediverse in general).

Hell, just look at photographers/painters v. image generators, or the novel/article/technical authors v. … practically all LLMs really, or any other of a dozen major stories about “AI” absorbing content and spitting out huge chunks of essentially unmodified code/writing/images.

Blaster M@lemmy.world · 5 months ago

So… if you don’t want the world to see your work, why are you hosting it publicly?

Hawk@lemmy.dbzer0.com · 5 months ago

If I copy McDonald’s site one by one for my own restaurant and just change the name, you can expect to be sued.

And yet, their site is available publicly?

AlexanderESmith@social.alexanderesmith.com · 5 months ago

“The world seeing [their] work” is not equal to “Some random company selling access to their regurgitated content, used without permission after explicitly attempting to block it”.

LLMs and image generators - that weren’t trained on content that is wholly owned by the group creating the model - is theft.

Not saying LLMs and image generators are innately thievery. It’s like the whole “illegal mp3” argument. mp3s are just files with compressed audio. If they contain copyrighted work, and obtained illegitimately, THEN their thievery. Same with content generators.

Victoria Antoinette @lemmy.world · 5 months ago

stealing removes something. copying makes more of it. it’s not theft

AlexanderESmith@social.alexanderesmith.com · 5 months ago

The MPAA and music industry would beg to differ. As would the US courts, as well as any court in a country we share copyright agreements with.

Consider that if a movie uses a scene from another movie without permission, or a music producer uses a melody without permission, or either of them use too much of an existing song without permission, everyone sues everyone else, and they win.

Consider also that if a large corporation uses an individual’s content without permission, we have documented cases of the individual suing, and winning (or settling).

Some other facts to consider;

An mp3 file is not inherently illegal. Nor is a torrent file/tracker/download.
If the mp3 file contains audio you don’t own the rights to, it is illegal, same for the torrent you used to download/distribute it. In the eyes of the law, it’s theft.
A trained LLM or image generation model is not inherently theft, if you only use open-source or licensed/owned content to train it
(at odds in our conversation) What of a model that eas trained with content the trainer didn’t own?

In the mp3 example, its largely an individual stealing from a large company. On the Internet, this is frequently cheered as the user “sticking it to the man” (unless, of course, you’re an indie creator who can’t support yourself because everyone’s downloading your content for free). Discussions regarding the morality of this have been had - and will be had - for a long time, but it’s legality is a settled matter: It’s not legal.

In the case of “AI” models, its large companies stealing from a huge number of individuals who have no support or established recourse.

You’re suggesting that it’s fine because, essentially, the creators haven’t lost anything. This makes it extremely clear to me that you’ve never attempted to support yourself as a creator (and I suspect you haven’t created anything of meaning in the public domain either).

I guess what it comes down to is this; If creators can be stolen from without consequence, what incentive does anyone have to create anything? Are you going to work your 40-60 hours a week, then come home and work another 20-40 hours to create something for no personal benefit other than the act of creation? Truely, some people will. Most wont.

Victoria Antoinette @lemmy.world · 5 months ago

people made art, music, and stories long before copyright

Victoria Antoinette @lemmy.world · 5 months ago

this doesn’t address what I said at all.

AlexanderESmith@social.alexanderesmith.com · 5 months ago

The first sentence directly addresses your comment “it’s not theft” with “the law says it is”.

The rest of the post attempts to explain why it is so and some of the moral or ethical discussions surrounding some examples.

Victoria Antoinette @lemmy.world · 5 months ago

the law does not say it is theft.

Eiim@lemmy.blahaj.zone · 5 months ago

Copyright violations ≠ conversion. Those are two completely different sets of laws. If you’re going to argue that legal definitions back you up, at least make sure you know what they are?

Elias Griffin@lemmy.world · 5 months ago

Discussion Primer: From my perspective and potential millions of others, the readme is part of the software, it is delivered with the software whether zip, tar, git. Itself, Markdown is a specifiction and can be consider the document as software.

In fact README is so integral to the software you cannot run the software without it.

Conclusion: I think we all think of readme, especially ones with examples of your code in your readme, as code. I have evidence AI trains on your README even if you tell it specifally not to use readme, block readme, block markdowns, it still goes after it. Kinda scary?

I want everyone else to have the evidence I have, Science.

catloaf@lemm.ee · 5 months ago

I mean this in the best possible way, but have you ever had any mental health evaluations? I’m not sure if they’re still calling it paranoid schizophrenia, but the way you write makes me concerned.

Elias Griffin@lemmy.world · edit-2 5 months ago

I write the smartest in the room, passionate, with wisdom and evidence. The way you defame someone like this makes me definitely sure you are not afraid to defame someone’s character with no evidence of anything but your own stupidity and un-awareness.

subignition@fedia.io · 5 months ago

I think your problem is here:

You should test this out for yourself as I’m not going to take days or a week making a great presentation of a technical case.

You’ve written a whole lot to try to be convincing but ultimately stopped short of actually proving what you’ve alleged. It looks to me you are frustrated that no one is taking you at your word and going down this rabbit hole themselves, when the various reputational elements you’re relying on are going to be important only to a minority of users. Burden of proof works how it always has, however.

catloaf@lemm.ee · 5 months ago

This is out of genuine concern, my dude. Your other comment accusing me of not being a real person is positively alarming.

Elias Griffin@lemmy.world · edit-2 5 months ago

Your rapacious backwards insult of caring is gross and obvious. You called me “my dude” like a teenger whose chill, and calm, and correct, but just …a child and wrong in the end. How old are you child? My Lemmy profile is my name with my Seal naturally born March 4th, 1974 as Elias Christopher Griffin. I’ve done more in my life than most people do in 10. My mental health is top 3% as is my intellect.

You are an un-named rando lemmy account named “catloaf” who averages 16 posts a day for the past 4 months with no original posts of your own because you aren’t original.

I make only original posts. You seem nothing like a real person. Want to tell us who you are? What makes you special, outside of the mandated counseling you recieve or data models you intake?

You know what, no one takes what you say seriously loaf of cat, I certainly didn’t, don’t, and won’t. Here is space for your next hairball

subignition@fedia.io · 5 months ago

I take back the benefit of the doubt I gave in my earlier reply. This reply is as unhinged as the Navy SEAL copypasta. You need mental health support.

DudeDudenson@lemmings.world · 5 months ago

This really reads like copy pasta, if someone told me you were an LLM configured to make antiAI people look bad I’d believe them

AlexanderESmith@social.alexanderesmith.com · 5 months ago

It’s not paranoia if you have proof that they’re stealing your content without permission or compensation.

You come off as an AI bro apologist. What they’re doing isn’t okay.

catloaf@lemm.ee · 5 months ago

Just because they are out to get you doesn’t mean you’re not paranoid, and vice versa.

I have nothing for or against AI/ML as a tool, my issue with it is when companies scrape huge amounts of data in violation of the author’s rights, as in OP’s example. Although I’m not quite sure why he’s keeping code in the README.md file; usually that’s for basic installation and usage, and full examples are kept in full documentation. That said, I highly doubt README.md files are public domain, so they shouldn’t be automatically used as training materials.

AlexanderESmith@social.alexanderesmith.com · 5 months ago

I’m not quite sure who’s argument you’re making here. It reads like you agree with OP and I (e.g. “LLMs shouldn’t be using other people’s content without permission”, et al).

But you called OP paranoid… I assumed because you thought OP thought their content was being used without their permission. And it’s extremely clear that this is what is happening…

What am I missing?

wizardbeard@lemmy.dbzer0.com · 5 months ago

These concepts are not mutually exclusive. You can be right about AI considerably overstepping boundaries and still be exhibiting classic signs of paranoia issues, which OP is.

Their immediate response to people not reacting to this post and their comments is to immediately jump to the idea that they’re being targeted by their designated enemy. That’s not particularly healthy.

I’m worried that AI is becoming the new gangstalking for tech aligned people predisposed to disprdered thinking.

AlexanderESmith@social.alexanderesmith.com · 5 months ago

I agree that their replies are a little… over the top. That’s all kind of a distraction from the main topic though, isn’t it? Do we really need to be rendering armchair diagnoses about someone we know very little about?

I mean, if I posted a legitimate concern - with evidence - and I was dog-piled with a bunch of responses that I was a nutter, I’d probably go on the defensive too. Some people don’t know how to handle criticism or stressful interactions, it doesn’t mean we should necessarily write them (or their verified concerns) off.

DudeDudenson@lemmings.world · 5 months ago

Frankly op replied to his own post multiple times with no prompting whatsoever, just reading through this stuff I’m concerned about him as well. LLM stuff not withstanding and even if he’s right he seems somewhat obsessed with this in an unhealthy way

DudeDudenson@lemmings.world · 5 months ago

Frankly op replied to his own post multiple times with no prompting whatsoever, just reading through this stuff I’m concerned about him as well. LLM stuff not withstanding and even if he’s right he seems somewhat obsessed with this in an unhealthy way

Elias Griffin@lemmy.world · 5 months ago

It all started with this today:

Perplexity AI Is Lying about Their User Agent https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/

wizardbeard@lemmy.dbzer0.com · edit-2 5 months ago

Hey Elias, found some confounding info: looks like Perplexity AI doesn’t respect the methods of blocking scrapers through robots.txt so this might just be an issue with them specifically being assholes.

Couldn’t figure out how to tag you in a comment on the other post, so I’ll edit this comment in a moment with the link.

Link: https://lemmy.world/post/16716107

AI Loophole #1; Your GitHub README.md

AI Loophole #1; Your GitHub README.md

Recommend the removal of any “primary logic” functional code examples out of your README.md, that’s it.

Recommend the removal of any “primary logic” functional code examples out of your `README.md`, that’s it.