maritimemarketing . agency
Logistics chain cargo movement interactive screen logistics office against backdrop port
AI SEO 5 Dec 2025

Should maritime companies block AI crawlers? A balanced read

The case for and against blocking GPTBot, ClaudeBot and Google-Extended on a maritime corporate site, and what we recommend for most clients.

Every quarter we get the same question from a maritime client: should we block GPTBot? It usually comes from the legal team or from a board member who has read a copyright-litigation headline. The honest answer is more nuanced than either “yes, block them” or “no, leave it open”.

The case for blocking

Training-data control

When OpenAI’s GPTBot, Anthropic’s ClaudeBot or Google’s Google-Extended crawl your site, they are likely using your content to train future models. For some companies this raises real concerns: proprietary methodology, trade secrets that have leaked into marketing copy, commercially sensitive data on partnerships or pricing.

A maritime equipment manufacturer with proprietary engineering that is described in detail on their website has a legitimate reason to limit what enters the training corpus. So does a ship management company whose case studies inadvertently disclose client confidentiality.

Litigation positioning

Some maritime companies, particularly those with US listings, see active blocking of AI training crawlers as a defensive position in case AI copyright litigation evolves into class actions or licensing regimes. Whether this is sound legal strategy depends on your counsel and your jurisdiction; we are not lawyers and do not give legal advice. But the position is plausible enough that it shows up on board agendas.

Bandwidth and cost

For very large maritime sites with technical libraries, regulatory archives and vessel databases, AI crawler traffic can be significant. Cloudflare reports that GPTBot, ClaudeBot and similar can collectively consume meaningful bandwidth. For most corporate sites this is negligible; for content-heavy specialist platforms it is real.

The case against blocking

You forfeit citation visibility

Blocking GPTBot prevents your content from being used in training future models. Over a horizon of two to five years, this likely degrades your visibility in ChatGPT for queries where the model relies on training data rather than retrieval. Your competitors who did not block will be in the answers; you will not.

Live retrieval is still possible if you allow ChatGPT-User

Many companies block the training crawlers but allow the live-fetch user agents. This is a defensible middle position: opt out of training but stay visible in real-time citations. But it requires careful robots.txt configuration that most marketing teams cannot maintain alone.

The competitive cost is asymmetric

If three quarters of your competitors leave their crawlers open, blocking puts you at a structural disadvantage in AI search even if your content quality is identical. The asymmetry is real and growing. In most maritime categories we audit, the brands that have remained open are accumulating citation-graph weight that the blockers are not.

Reversal is harder than people expect

Blocking is easy. Unblocking does not immediately recover lost ground because the training corpus you missed has already been baked into model versions. You can rejoin the training pipeline going forward, but the lost months are not retroactive.

What we recommend for most maritime corporate clients

For most ship managers, port operators, marine equipment manufacturers, classification societies and maritime software vendors, our default recommendation is to leave the crawlers open with a clean robots.txt that covers the user agents you actually care about. Be specific. Block what genuinely needs blocking. Allow what supports citation visibility.

A reasonable starting point:

  • Allow GPTBot, ClaudeBot, Google-Extended, OAI-SearchBot, ChatGPT-User, PerplexityBot, anthropic-ai. These contribute to your citation visibility.
  • Disallow the user agents on directories that contain genuinely sensitive content: client-only logins, gated technical specs, internal documents that have ended up indexable by mistake.
  • Audit quarterly. Whichever side you come down on, the user agent list changes and your robots.txt should be reviewed each quarter.

When to lean towards blocking

There are scenarios where blocking is the right answer:

  • Companies under sanctions sensitivity where any extraction of operational data into a training corpus is unacceptable.
  • Companies with active patent or trade secret protection on content that has been indexed.
  • Companies whose primary digital asset is a proprietary database or technical library that they monetise through subscriptions.

For these cases, work with your legal team on the specific user agents to block and document the decision. Treat it as a strategic call, not a default.

The legal landscape on AI training and copyright is evolving in the EU, the UK and the US. By the time you read this, the position may have shifted. The technical recommendations here address the visibility trade-off; the legal trade-off is a separate analysis your counsel should make with current information. The two trade-offs do not always pull in the same direction.

Frequently asked questions

Does blocking AI crawlers prevent citations?
Partially. Blocking GPTBot prevents OpenAI from training future models on your content but does not stop ChatGPT from citing you when it retrieves your page live through ChatGPT-User. Blocking ChatGPT-User prevents the live retrieval. To suppress all citations you would need to block training crawlers, retrieval crawlers and live-fetch user agents, which most companies do not actually want.
Are there sectors of maritime where blocking makes sense?
Highly confidential or sanctions-sensitive content, executive protection considerations, or proprietary technical data that you do not want extracted into training corpora. For general corporate marketing content, the upside of being indexed and cited typically outweighs the downside.
How often is the AI crawler user agent list changing?
New user agents appear two or three times a year, often quietly, and existing crawlers occasionally rename or split into training and retrieval variants. Build the user agent list into a quarterly robots.txt review rather than treating the file as a set-and-forget artefact. We have seen sites accidentally block a new crawler for a full quarter because nobody refreshed the list.
Share

Want help putting this into practice?

We work with maritime companies on exactly this kind of programme. Tell us about yours.