A support agent wants to rewrite a complaint neatly and pastes the whole email thread into ChatGPT. At the bottom sit a name, an account number, a phone number, and an IBAN. The question was useful. The problem is the context that came along.
That is how an AI data leak usually starts: not with a hack, but with a paste. The good news: a short checklist and a rule of thumb prevent most of it.
Why this matters: a prompt is a processing moment
To you, an AI chatbot feels like a scratchpad with superpowers. Under the GDPR it is something else: a place where data is processed, possibly stored, remembered, analysed, or passed to a connected app. Whatever you type, you essentially hand over to the platform. In a personal or free account, that text may also be used to improve the model, unless you have turned that off.
So the first question is not whether the answer is correct, but which data you put in the prompt and whether it needed to be there.
What you never put in an AI prompt
This data does not belong in a chatbot, not even "just quickly":
- Passwords, PINs, and recovery codes. Login details go to no one.
- National ID, passport, or licence numbers. Unique numbers that make a person directly identifiable.
- Bank and credit card details. IBAN, card numbers, CVC.
- Medical information. Lab results, diagnoses, treatment notes. A chatbot is not bound by the same rules as your doctor.
- Customer and patient records. Names combined with files, contracts, or payment data.
- Confidential company data. Internal memos, strategy, non-public figures.
- Source code with secrets. API tokens, keys, configuration with credentials.
The common thread: the more a piece of data makes someone identifiable, or the more confidential it is, the stronger your reason to share it has to be. In practice, that reason almost never exists.
What is usually fine
AI is most useful when you give it the task without the sensitive content:
- General questions. Explanation, structure, brainstorming, language and tone.
- Anonymised text. Replace real names with "Customer A", round amounts, drop locations.
- Dummy data and placeholders. Use
name@example.comandAccount 0000instead of the real value. - Your own non-confidential work. A blog, a public text, a draft without personal data.
You often do not need the real data to get the task done. Improving a review sentence works without the salary or medical context. Drafting a payment email works with a placeholder IBAN. That is the core idea: keep the task, minimise the traceable data. More techniques are in the prompt redaction guide.
Watch files and screenshots
Files are the biggest blind spot. A PDF can contain customer names, order numbers, contract parties, or retention periods. A phone screenshot can quietly show notifications, email addresses, calendar entries, or location data. What you do not see, you still upload. So check a file before you put it in an AI tool, or strip the sensitive parts first.
The rule of thumb for borderline cases
Not sure? Use these two:
- The stranger test. Would you not email this to a stranger outside your organisation? Then it does not belong in an AI chatbot.
- The minimum test. The fewer traceable details, the lower the risk. Leave out what the task does not need.
How to make this visible in the moment
The tricky part is that you often spot a sensitive detail only after you have sent it. A mental checklist helps, but under time pressure something slips through. That is why it helps to make the risk visible as you type.
BeeSensible highlights sensitive data while you write a prompt in browser-based AI tools. You see a coloured highlight on, say, a name, an IBAN, or an ID number, and you decide what to do: remove it, replace it with a realistic alternative, or mask it. The task stays intact, the traceable data comes out, before you send.
Want to know whether your specific tool stores your data or uses it for training? Read about ChatGPT and work data, or see the broader overview on AI data leakage.