Auditing AI visibility: a practical methodology for maritime

A structured AI visibility audit answers a specific question: when buyers in your maritime category ask ChatGPT, Claude, Gemini, Copilot or Perplexity about suppliers, where do you appear and where do you not? Without an audit you are guessing. With one, the gaps become a roadmap.

We have run this audit quarterly for ship managers, port operators, marine equipment manufacturers, classification societies and maritime software vendors. The methodology has settled into something pragmatic.

The structure

1. Prompt set

Build a set of thirty to fifty prompts that real buyers in your category would type. Three categories of prompt:

Category-defining. “Who are the leading independent ship managers for VLCC fleets?” These probe whether you appear in the canonical category answer.

Buyer-style natural language. “I am chartering manager at a Greek owner with a fleet of LR2 tankers operating Asia-Middle East routes. We need a technical manager. Who should we consider?” These probe whether you appear under realistic buying contexts.

Brand-name probes. “What does Acme Ship Management do?” “How big is Acme Ship Management?” These probe what the model thinks it knows about you specifically.

2. Engine coverage

Run each prompt through ChatGPT, Claude, Gemini, Copilot and Perplexity, using the current default model on each. Use a fresh chat each time to avoid memory contamination. For ChatGPT, run once with browsing enabled to capture retrieval-time answers and once without to capture training-data-only answers.

3. Scoring

For each prompt and engine, record:

Whether your brand appears at all (0/1)
Position in the cited list (if listed)
Whether the citation is positive, neutral or qualified
Whether the model cites a specific URL (and which)
The competitor brands cited

A simple spreadsheet works. A more mature programme uses a custom tool that snapshots the full response text alongside the structured score, so you can re-read the responses three months later and notice qualitative shifts.

4. Aggregation

Roll the per-prompt scores up to:

Citation rate by engine (you appear in X% of relevant prompts)
Mean position when cited
Competitive set: which brands appear most often alongside you, and which appear without you
URL pattern: which of your pages get cited and which never do

What the results tell you

The first audit usually surprises clients in three ways.

You are absent from prompts you assumed you would dominate. A ship manager with twenty-five years of trade press coverage discovers they appear in 30% of “leading ship managers” prompts because their service pages are unstructured marketing prose that the parser cannot extract.

Your competitors are ranked differently than you would expect. Smaller competitors who have invested in structured content punch above their weight. Larger competitors with stale websites underperform their actual market position.

Specific URLs underperform. Your homepage gets cited; your service pages do not. Or your blog posts get cited but your case studies do not. The pattern points directly at where the next quarter’s content work needs to focus.

Cadence and reporting

Quarterly is the right cadence. Models update on roughly that timeline, content investments compound on roughly that timeline and the underlying buyer behaviour does not shift faster than that.

The internal report should be one page of summary metrics, one page of competitor movement and one page of recommended actions tied to specific URLs. If the audit produces a fifty-page deck nobody reads, you have engineered the wrong artefact. The point is to drive content decisions, not to admire the data.

Once you have a baseline, run the same prompt set every quarter. The trend matters more than any single reading. A site that moves from 28% citation rate to 41% over three quarters is doing the right work. A site that stays flat at 35% with random per-engine variance is not, even if 35% sounds respectable.

The audit takes a quarter-day to run cleanly once the prompt set is fixed, but it is the only way to know whether your AI visibility work is paying off. Without it, you are running a content programme on faith.

Frequently asked questions

How many prompts should an AI visibility audit cover?

Thirty to fifty per category. Fewer than thirty and you cannot tell signal from noise. More than fifty becomes hard to maintain quarterly. The set should mix buyer-style natural language with category-defining queries and a small number of brand-name probes.

Should we test the same prompt repeatedly to see variance?

Yes. LLM outputs are non-deterministic, especially on borderline questions. Run each prompt three times in a single session and record the union of brands cited across the three runs. That gives a more stable picture than a single shot.

Who in the team should own the audit?

A marketing analyst or content strategist with editorial judgement and enough sector knowledge to read the responses critically. The scoring requires recognising hedged citations, qualified mentions and cross-source contradictions, which takes domain familiarity rather than a particular job title. If you cannot resource this internally, an agency partner who runs the audit consistently across quarters is worth more than a one-off vendor scan.