Lemmy hates AI.
I’m fully supportive of the accessibility for persons with disabilities, to be clear. It’s ironic though. Does Lemmy’s open source code make it easier for bots to scrape it?
Something I mention every time this comes up. AI doesn’t need to scrape Lemmy. All someone has to do is set up their own federated instance and AcitivityPub will wrap it up in a nice JSON format for them to parse however they want. And there’s fundamentally nothing a person can do about it.
It’s just best to realize anything and everything on Lemmy is publicly available for any use, good or bad.
It’s not perfect training data. Being encouraged to add alt text and actually doing it are two different things. Writing good alt text is another matter all together. And anything that’s on the internet is training data whether people want it to be or not. The only difference is ethical whether the scraper accepts and respects a version of robots dot txt, i.e. “do not scrape,” that communicates the training data’s holders’ intentions. And if they torrent books you can guess how respectful they are.


