Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's a historical question. At this time, most if not all the bots were either search engines or archival. The name was even "RobotsNotWanted.txt" at the beginning but made "robots.txt" for simplicity. To give another example, Internet Archive stopped respecting it a couple of years ago, and they discuss this point (crawlers vs other bots) here [1].

[1] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...



You meant search bots and other bots? Internet Archive's bot is a crawler.

They showed no difference between search bots and archive bots. robots.txt was never for SEO alone. Sites exclude print versions so people see more ads and links to other pages. Sites exclude search pages to conserve resources. They said sites exclude large files for costs. And they can't think sites want sensitive areas like administrative pages archived.

Really Internet Archive stopped respecting robots.txt because they wanted to archive what sites didn't want them to archive. Many sites disallowed Internet Archive specifically. Many sites allowed specific bots. Many sites disallowed all bots and meant all bots. And hiding old snapshots when a new domain owner changed robots.txt was a self inflicted problem. robots.txt says what to crawl or not now. They knew all of this.


If it was uniquely an historical question then another text file to handle AI requests would exist by now, e.g. ai-bots.txt, but it hasn't and likely never will, they don't want to even have to pretend to comply with creator requests about forbidding (or not) the usage of their sites.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: