Tracking AI search referrals in server logs for maritime sites

If you want to know whether your maritime AI SEO work is paying off, server logs tell you faster than analytics. By the time GA4 shows you a traffic shift, the underlying retrieval pattern has been visible in raw logs for weeks. Most maritime marketing teams do not look at logs at all. The ones that do have a measurable advantage.

What to look for in the logs

AI bot crawls

These are the user agents that fetch your pages on behalf of LLM training pipelines, retrieval systems and live chat sessions.

Training and indexing crawlers. GPTBot (OpenAI), ClaudeBot and anthropic-ai (Anthropic), Google-Extended (Google’s training corpus opt-in), CCBot (Common Crawl, used by many systems), Bytespider (ByteDance), Applebot-Extended.

Retrieval and live-fetch crawlers. OAI-SearchBot, ChatGPT-User (real-time fetch when a user asks a question), PerplexityBot, Perplexity-User, GeminiBot in some configurations.

The pattern matters. A spike in ChatGPT-User traffic on a specific page often precedes that page appearing in citations by days or weeks. Watching the live-fetch user agents is the closest thing to a real-time signal you have.

Referrer-based AI search traffic

When a buyer clicks through from a chat tool to your site, the referrer header (where it survives) typically reads chat.openai.com, perplexity.ai, gemini.google.com or copilot.microsoft.com. These are real users, not bots. They are smaller in volume than bot traffic but disproportionately valuable: they have already read an answer that mentioned you and clicked through for more.

Direct traffic with no referrer

A growing share of AI-driven traffic arrives with no referrer at all because the user pasted a URL the chat tool generated, or the chat tool stripped the referrer. Watch for direct-traffic spikes correlated with bot crawl spikes. The pattern is suggestive even when not provable.

How to instrument the logs

Most maritime sites we work with run on a stack that produces some flavour of access log: nginx, Apache, Cloudflare, Vercel or the WordPress host’s own log format. Pull the last 90 days into a single normalised dataset. The schema you want is minimal:

Timestamp
IP address (for sanity-checking that the user agent is genuine)
User agent string
Path
Referrer
Status code

Run a daily aggregation that produces:

Top ten pages crawled by AI bots, with weekly trend
Top ten pages with referrals from chat.openai.com, perplexity.ai, gemini.google.com or copilot.microsoft.com
Total bot fetches per day, by named user agent

A simple Python or Node script writing to a small SQLite database is enough. We have built more sophisticated versions for larger clients, but the daily aggregation script is what does 80% of the work.

What the data tells you

Which pages are working

The pages that get crawled most by ChatGPT-User and PerplexityBot are the ones currently being cited. If your homepage is being hit hourly but your tanker management service page sees a fetch a week, the citation pattern is telling you what the model considers relevant.

Which pages should be working but are not

A service page that you have written carefully and that is well-linked from your homepage but never gets a live AI fetch is invisible to the retrieval system. That is a signal to audit the page’s structural quality, schema and inbound links.

When a content change starts to land

After you publish a rewritten service page, watch for the first ChatGPT-User or PerplexityBot fetches. They typically arrive within two to four weeks. The first fetch is followed by sporadic recrawls as the model encounters your page in different contexts. By six to eight weeks, you should see citation appearances in your quarterly audit.

When you have a problem

A sudden drop in AI bot crawls (especially GPTBot or ClaudeBot) usually means a robots.txt or firewall rule has accidentally excluded them. We have seen this happen three times: a security plugin update, a CDN rule change and a misconfigured Cloudflare worker. All caught within days because the log-aggregation script flagged the gap. Without logs, the same issue would have been invisible for a quarter.

The instrumentation is one developer day. The ongoing maintenance is one hour a month. The visibility it gives you into the AI search system is genuinely larger than any commercial AI SEO tool currently sells.

Frequently asked questions

Will Google Analytics show AI search traffic?

Partially. GA4 captures referrals from chat.openai.com, perplexity.ai and a handful of others, but it misses anything where the user copy-pastes a link or the AI tool fetches the page server-side without rendering. Server logs catch what GA misses, especially the bot fetches that precede a citation.

What user agents should I be watching for?

GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot, anthropic-ai, Google-Extended, Bytespider and CCBot for Common Crawl. The list grows. Maintain it as a single source of truth in your log-analysis script and update quarterly.

How do we tell genuine AI bot fetches from spoofed user agents?

Cross-check the user agent against the source IP range. The major LLM providers publish IP ranges or verification mechanisms (reverse DNS for GPTBot, for example). A request claiming to be GPTBot from a residential IP is almost certainly a scraper. Filter spoofed traffic out of your aggregation script before it pollutes the trend data.