Should maritime companies block AI crawlers? A balanced read
The case for and against blocking GPTBot, ClaudeBot and Google-Extended on a maritime corporate site, and what we recommend for most clients.
Every quarter we get the same question from a maritime client: should we block GPTBot? It usually comes from the legal team or from a board member who has read a copyright-litigation headline. The honest answer is more nuanced than either “yes, block them” or “no, leave it open”.
The case for blocking
Training-data control
When OpenAI’s GPTBot, Anthropic’s ClaudeBot or Google’s Google-Extended crawl your site, they are likely using your content to train future models. For some companies this raises real concerns: proprietary methodology, trade secrets that have leaked into marketing copy, commercially sensitive data on partnerships or pricing.
A maritime equipment manufacturer with proprietary engineering that is described in detail on their website has a legitimate reason to limit what enters the training corpus. So does a ship management company whose case studies inadvertently disclose client confidentiality.
Litigation positioning
Some maritime companies, particularly those with US listings, see active blocking of AI training crawlers as a defensive position in case AI copyright litigation evolves into class actions or licensing regimes. Whether this is sound legal strategy depends on your counsel and your jurisdiction; we are not lawyers and do not give legal advice. But the position is plausible enough that it shows up on board agendas.
Bandwidth and cost
For very large maritime sites with technical libraries, regulatory archives and vessel databases, AI crawler traffic can be significant. Cloudflare reports that GPTBot, ClaudeBot and similar can collectively consume meaningful bandwidth. For most corporate sites this is negligible; for content-heavy specialist platforms it is real.
The case against blocking
You forfeit citation visibility
Blocking GPTBot prevents your content from being used in training future models. Over a horizon of two to five years, this likely degrades your visibility in ChatGPT for queries where the model relies on training data rather than retrieval. Your competitors who did not block will be in the answers; you will not.
Live retrieval is still possible if you allow ChatGPT-User
Many companies block the training crawlers but allow the live-fetch user agents. This is a defensible middle position: opt out of training but stay visible in real-time citations. But it requires careful robots.txt configuration that most marketing teams cannot maintain alone.
The competitive cost is asymmetric
If three quarters of your competitors leave their crawlers open, blocking puts you at a structural disadvantage in AI search even if your content quality is identical. The asymmetry is real and growing. In most maritime categories we audit, the brands that have remained open are accumulating citation-graph weight that the blockers are not.
Reversal is harder than people expect
Blocking is easy. Unblocking does not immediately recover lost ground because the training corpus you missed has already been baked into model versions. You can rejoin the training pipeline going forward, but the lost months are not retroactive.
What we recommend for most maritime corporate clients
For most ship managers, port operators, marine equipment manufacturers, classification societies and maritime software vendors, our default recommendation is to leave the crawlers open with a clean robots.txt that covers the user agents you actually care about. Be specific. Block what genuinely needs blocking. Allow what supports citation visibility.
A reasonable starting point:
- Allow GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, ChatGPT-User, PerplexityBot, anthropic-ai. These contribute to your citation visibility.
- Disallow the user agents on directories that contain genuinely sensitive content: client-only logins, gated technical specs, internal documents that have ended up indexable by mistake.
- Audit quarterly. Whichever side you come down on, the user agent list changes and your robots.txt should be reviewed each quarter.
When to lean towards blocking
There are scenarios where blocking is the right answer:
- Companies under sanctions sensitivity where any extraction of operational data into a training corpus is unacceptable.
- Companies with active patent or trade secret protection on content that has been indexed.
- Companies whose primary digital asset is a proprietary database or technical library that they monetise through subscriptions.
For these cases, work with your legal team on the specific user agents to block and document the decision. Treat it as a strategic call, not a default.
A note on legal nuance
The legal landscape on AI training and copyright is evolving in the EU, the UK and the US. By the time you read this, the position may have shifted. The technical recommendations here address the visibility trade-off; the legal trade-off is a separate analysis your counsel should make with current information. The two trade-offs do not always pull in the same direction.
Frequently asked questions
Does blocking AI crawlers prevent citations?
Are there sectors of maritime where blocking makes sense?
How often is the AI crawler user agent list changing?
More on AI SEO
-
AI SEO
The 12-month AI SEO roadmap for a mid-sized maritime company
A practical month-by-month AI SEO plan for a mid-sized maritime company, covering audit, structural fixes, content, authority building and measurement.
By Paul Clapp -
AI SEO
Auditing AI visibility: a practical methodology for maritime
How we run a structured AI visibility audit for maritime clients, what we test, how we score and how to make the results actionable for a marketing team.
By Paul Clapp -
AI SEO
AI search and the maritime regulatory question (compliance content)
How AI search treats maritime regulatory content (EEXI, CII, MARPOL, SOLAS, ISM) and how to publish compliance content that gets cited responsibly.
By Paul Clapp