I just banned 41 bad user agents from accessing any of my services. 😱
Bloody fucking hell. I think one of Google’s GenAI crawlers was just hitting my Gitea instance quite hard. Fuck 🤬 Geez
@prologic You might (not) enjoy this blog post: https://pod.geraspora.de/posts/17342163
@movq@www.uninformativ.de Yeah it’s starting to piss me off too 🤣 Not nearly as much as that guy, but stil. Anyway I’m having fun! Now I just need to find a good IP/Subnet list that I can blacklist entirely, ideally one that’s updated frequently so I can refresh firewall rules.
Did you have disallow rule in robots.txt? (I think not because can google several twtxt.net posts)
@doesnm@doesnm.p.psf.lt No. I generally don’t put up any robots.txt
files at all really, because they mostly get ignored. I don’t generally mind if “normal” web crawlers crawl things. But LLM(s) can go fuck themselves 🤣
@prologic Yeah, robots.txt or ai.txt are not worth the effort. I have them, but they get ignored. Just now, I saw a stupid AI bot hitting one of my blog posts like crazy. Not just once, but hundreds of times, over and over. 🤦🙄
@movq@www.uninformativ.de Yeah I swear to god the engineers that write this shit™ don’t know how to write distributed cralwers that don’t happy the shit™ out of their targets 🤦♂️