AnActOfCreation@programming.dev to Technology@lemmy.worldEnglish · 9 months ago‘Reddit can survive without search’: company reportedly threatens to block Googlewww.theverge.comexternal-linkmessage-square314fedilinkarrow-up11.24Karrow-down123cross-posted to: technology@lemmy.world
arrow-up11.22Karrow-down1external-link‘Reddit can survive without search’: company reportedly threatens to block Googlewww.theverge.comAnActOfCreation@programming.dev to Technology@lemmy.worldEnglish · 9 months agomessage-square314fedilinkcross-posted to: technology@lemmy.world
minus-squareonline@lemmy.mllinkfedilinkEnglisharrow-up8arrow-down2·edit-29 months agoSpeaking of this, what parts of the fediverse have added the option to block training generative AI to their respective robots.txt? https://blog.google/technology/ai/an-update-on-web-publisher-controls/ https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers https://techcrunch.com/2023/09/28/medium-hints-at-a-nascent-media-coalition-to-block-ai-crawlers/ It looks like there’s a handful of these lines you’d have to add to robots.txt Is there anywhere that keeps a comprehensive list of these?
minus-squarekingthrillgore@lemmy.mllinkfedilinkEnglisharrow-up2arrow-down1·9 months agoI’ve been trying to find a list as well to no avail. The ones I do know are on my own robots.txt, at volcanolair.co/robots.txt
minus-squareonline@lemmy.mllinkfedilinkEnglisharrow-up1arrow-down1·9 months agoSomeone should make a github just to make it easier for people to find them all in one place with sources and update the list as we get new ones.
Speaking of this, what parts of the fediverse have added the option to block training generative AI to their respective robots.txt?
https://blog.google/technology/ai/an-update-on-web-publisher-controls/ https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers https://techcrunch.com/2023/09/28/medium-hints-at-a-nascent-media-coalition-to-block-ai-crawlers/
It looks like there’s a handful of these lines you’d have to add to robots.txt
Is there anywhere that keeps a comprehensive list of these?
I’ve been trying to find a list as well to no avail. The ones I do know are on my own robots.txt, at volcanolair.co/robots.txt
Someone should make a github just to make it easier for people to find them all in one place with sources and update the list as we get new ones.