How Cloudflare uses generative AI to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives.
I find this amusing, had a conversation with an older relative who asked about AI because I am “the computer guy” he knows. Explained basically how I understand LLMs to operate, that they are pattern matching to guess what the next token should be based on a statistical probability. Explained that they sometimes hallucinate, or go of on wild tangents due to this and that they can be really good at aping and regurgitating things but there is no understanding simply respinning fragments to try to generate a response that pleases the asker.
He observed, “oh we are creating computer religions, just without the practical aspects of having to operate in the mundane world that have to exist before a real religion can get started. That’s good, religions that have become untethered from day to day practical life have never caused problems for anyone.”
This only makes AI models unreliable if they ignore “don’t scrape my site” requests. If they respect the requests of the sites they’re profiting from using the data from, then there’s no issue.
People want AI models to not be unreliable, but they also want them to operate with integrity in the first place, and not profit from people’s work who explicitly opt-out their work from training.
This will only make models of bad actors who don’t follow the rules worse quality.
You want to sell a good quality AI model trained on real content instead of other misleading AI output? Just follow the rules ;)
Maybe it will learn discretion and what sarcasm are instead of being a front loaded google search of 90% ads and 10% forums. It has no way of knowing if what it’s copy pasting is full of shit.
People complain about AI possibly being unreliable, then actively root for things that are designed to make them unreliable.
i mean this is just designed to thwart ai bots that refuse to follow robots.txt rules of people who specifically blocked them.
I find this amusing, had a conversation with an older relative who asked about AI because I am “the computer guy” he knows. Explained basically how I understand LLMs to operate, that they are pattern matching to guess what the next token should be based on a statistical probability. Explained that they sometimes hallucinate, or go of on wild tangents due to this and that they can be really good at aping and regurgitating things but there is no understanding simply respinning fragments to try to generate a response that pleases the asker.
He observed, “oh we are creating computer religions, just without the practical aspects of having to operate in the mundane world that have to exist before a real religion can get started. That’s good, religions that have become untethered from day to day practical life have never caused problems for anyone.”
Which I found scarily insightful.
Here’s the key distinction:
This only makes AI models unreliable if they ignore “don’t scrape my site” requests. If they respect the requests of the sites they’re profiting from using the data from, then there’s no issue.
People want AI models to not be unreliable, but they also want them to operate with integrity in the first place, and not profit from people’s work who explicitly opt-out their work from training.
This will only make models of bad actors who don’t follow the rules worse quality. You want to sell a good quality AI model trained on real content instead of other misleading AI output? Just follow the rules ;)
Doesn’t sound too bad to me.
Maybe it will learn discretion and what sarcasm are instead of being a front loaded google search of 90% ads and 10% forums. It has no way of knowing if what it’s copy pasting is full of shit.