SEO
Robots.txt
Quick definition
Robots.txt is a plain-text file in your site's root that tells search engine crawlers which URLs they may or may not crawl.
Robots.txt is a crawl directive, not an indexing directive. Blocking a URL in robots.txt prevents crawling but does not guarantee the URL won't appear in search results.
Why Robots.txt matters
Use robots.txt to manage crawl budget — for example, blocking infinite-parameter URLs, internal search results, and admin paths.
How Robots.txt works in practice
Use meta robots 'noindex' or HTTP X-Robots-Tag headers to prevent indexing. A noindexed page must remain crawlable (not blocked in robots.txt) for the directive to be read.
Best practices
- Don't use robots.txt to deindex content — use noindex instead.
- Test every change in Search Console's robots.txt Tester.
- Always include a Sitemap directive.
- Be careful with wildcard rules — they can over-block.
Need help applying this to your SaaS?
Get a free strategy call with our team — no pitch, just a clear next step.
Related terms
XML Sitemap
An XML sitemap is a structured file listing the URLs you want search engines to crawl and index, with metadata like last-modified dates and priority.
Crawl Budget
Crawl budget is the number of pages a search engine bot will crawl on your site within a given period, based on crawl rate limit and crawl demand.
Indexing
Indexing is the process by which search engines add discovered pages to their searchable index, making them eligible to rank for queries.