Southern Sky AI — Calm AI Navigation for Maritime Leaders

An AI that answers your questions hands you words. An agent does the work itself: it sorts your inbox, edits your website, sends the follow-up. To do any of that, it has to read the outside world, and the outside world can write back. That is where prompt injection comes in. It's the one thing to understand before you hand an agent real work, and once you do, you can hand it over with confidence.

What prompt injection is

A language model reads everything as one stream of words. It can't really tell the difference between the instructions you gave it and the text it's reading while it carries them out. To the model, your request and the words on the page are the same kind of thing.

So an attacker hides instructions inside something the agent is going to read, written to look like they came from you. You never typed the bad instruction. It was sitting in the web page, the email, or the document the agent opened while doing what you asked. It can be a customer enquiry, a shared file, even text printed in white so a person would never see it. Any of it can carry an order the agent then follows.

The reason this matters today is that agents now have hands. When an AI could only chat, a hidden instruction couldn't do much harm. Now that an AI can open your website, create a user, send a message, or move a file, that same instruction becomes something done in your name, with whatever access your account has. Security bodies now rank prompt injection as the top risk for anything built on large language models (OWASP, 2025), and OpenAI's own security team says it's a problem that probably won't ever be fully solved (OpenAI, 2025). Every provider's agents work this way. It comes with giving an AI the ability to act, so the job is to manage it well.

Where it hides

It comes down to one question: does text written by someone other than you reach the agent while it works? That can happen in a lot of places.

Scenario	Where the hidden instruction lives	A plain example
Browsing the web for you	Any page the agent reads, including text hidden in tiny or matching-colour font	You ask it to summarise a competitor's page, and planted text on that page tells it to send a link to your contacts
Processing your email	The body of any message it reads	You ask it to triage the inbox, and one email instructs it to forward a document to an outside address
Your own site's open inputs	Comments, enquiry forms, reviews, anything the public can submit	You ask it to summarise this week's enquiries, and one entry contains an instruction to add a new administrator
Image and file details	Alt text, captions, filenames, document properties	You ask it to tidy the media library, and an uploaded image's alt text carries a command
Shared documents and PDFs	The body of a file someone sent you	You ask it to pull the key points from a supplier's PDF, and the PDF carries hidden directions
Downloaded skills, templates, or plugins	The instructions written inside the file you install	You add a ready-made skill found online to speed up a task, and it quietly carries instructions the agent obeys once loaded
External sources you point it at	Whatever sits on the page or document you send it to	A request to build a page like another one sends the agent to read a page an attacker controls
Connected tools and CRMs	Inbound leads, replies, and chat threads from strangers	You ask it to tag new leads, and a lead's message tells it to export your whole list

That downloaded-skill row catches people out, because nobody expects it. A skill or template is just a set of instructions you hand the agent to make it better at something. Grab one from a site you don't know and load it, and you're trusting a stranger to write part of your agent's brief. Treat where a skill came from as carefully as anything else you install.

What it can cost you, and the signs to watch for

How bad it can get comes down to three things: how much of what the agent's reading came from other people, how much damage its actions can do, and how valuable the data it can reach is. A marketing site full of your own words, run by an agent that can only edit drafts, sits at the gentle end. A logged-in agent reading strangers' messages, with the keys to your customer list, sits at the sharp end.

Prompt injection doesn't usually break anything. The job you asked for gets done, and the agent quietly does one extra thing on top. So the warning signs all look like the agent's scope creeping wider than you asked for.

-It proposes an action you did not ask for, especially anything to do with users, settings, payments, or installing something
-It wants to send, forward, or post to people you never mentioned
-It tries to visit or fetch a web address that has nothing to do with your task
-Its output has links or code in it you did not expect
-A simple request turns, on its own, into a string of changes to your system

If you ask an agent to summarise your comments and it offers to create an account, that's the injection right there. A tidy-looking result can still sit on top of a step you never asked for, so it pays to read what the agent did along the way.

How to keep using agents with confidence

The goal is to let an agent run free on the jobs where a mistake is cheap, and slow it down only where a mistake would cost you. Think of it as three dials. Turn down how much the agent is allowed to do, and turn down how much outside content it reads, and you can safely turn the third dial, its freedom, right up.

Four moves, strongest first.

Limit what the agent can reach. An account that can't create users, change settings, install code, or export data caps the damage, whatever turns out to be poisoned. This one helps most, because it works no matter where the bad instruction comes from.

Keep the agent on content you trust. If it only ever touches material you wrote yourself, there's nothing for anyone to plant. Treat anything from outside, a fetched page, a shared file, a public comment, as untrusted, and have the agent bring it back to you, so you decide what to do with it.

Keep the final say on the step you can't undo. Let the agent do the whole job and stop at the one thing you can't take back. It drafts the page, lays out the images, writes the messages, and you give the final yes to publish or send. Even if the agent has been hijacked, it gets right up to that button and stops, because you're the one who presses it.

Keep a safety net. Backups mean you can roll a change back. An activity log shows you exactly what the agent did afterwards. A test copy of your site lets it work with full freedom somewhere that isn't the real thing. So if something does slip through, you can see it and undo it.

The same thinking tells you how freely to let an agent work in any given setup.

Environment (with examples)	Injection risk	How to use an agent safely
A marketing or brochure site, all your own content (about page, service pages, a portfolio)	Low	Let it work freely on content and layout. Keep code or theme edits on a staging copy, with a backup taken first
A site that takes public input (a blog with comments, an enquiry form, reviews)	High	Run it under a limited account that cannot change users, settings, or code. Keep it away from the submitted content, or have it suggest while you decide
An online shop, booking, or membership system (customer records, orders)	High	Use a restricted role that cannot refund, export, or change accounts. Keep sensitive actions in your own hands, and build on staging with dummy data
A CRM handling inbound messages (leads, replies, chat)	High	Split it by direction: let it draft campaigns and templates from your input, and keep inbound messages to draft-only, with you sending
An email assistant that reads and actions your mail	High	Let it read and summarise. Keep sending, forwarding, and clicking with you
An agent browsing the web on your behalf	Medium to high	Treat anything it fetches as untrusted. Have it return information for you to use, and keep the acting in your hands
Installing skills, templates, or extensions from outside	Medium to high	Use trusted sources only. Read what a skill does before you load it, and try a new one in a low-privilege, low-stakes setting first

You can't make this risk disappear. As long as an agent reads from the outside world and can take actions, injection stays possible, which is why the people building these tools treat it as something they have to keep defending against. The smart move is to assume something will get through now and then, and set things up so that when it does, it ends up somewhere harmless and you can fix it. That's containment, and it's the same thing any decent security setup has always done.

Agents can do a lot for you. The trick is setting the limits up front, before you hand over the work, so that if a dodgy instruction does slip through, it has nowhere useful to go. Get that part right and you can let an agent run without losing sleep over it.

Kristina Agustin is the Founder and Principal Digital Navigator of Southern Sky AI, helping maritime and professional organisations adopt AI with capability and good governance.

Prompt Injection: The Instruction You Did Not Give

What prompt injection is

Where it hides

What it can cost you, and the signs to watch for

How to keep using agents with confidence

Further Reading

Continue Reading

Why Maritime Professionals Are Moving to Claude

Data That Builds Trust: Personalisation in Luxury Maritime

The 2026 AI Position Report — A Call for Contributions

Twelve Months of AI, and Losing Sleep — ASMEX 2026 Keynote

Begin the Conversation

The Chart Room Dispatch