Is it safe to upload confidential documents to AI?

Only if the vendor contractually commits to not training on your data, gives you deletion control, and ideally holds a recognised security audit such as SOC 2 or ISO 27001. Free consumer tools rarely offer these protections, so they are a poor fit for confidential material. When in doubt, prefer a tool that reads the document in place, read-only, and check with whoever owns your information-security policy first.

Can AI companies see the documents I upload?

It depends on the vendor and the tier. Some restrict access tightly and delete documents quickly; others permit staff review or use your content to train models. Check the privacy policy and DPA for who can access uploads and under what conditions. With tools that read documents in place rather than storing uploads, there is no retained copy for anyone to browse later.

Does Sidenote store my documents?

Only what you choose. Sidenote reads the document where it already lives - the page in your browser, or a SharePoint/Confluence file - rather than uploading a copy by default. Per document, you can pick Store, which keeps it indexed in your UK-hosted (eu-west-2) account for as long as your account is open, or Discard, which purges it within 24 hours. Every account is isolated with row-level security and encrypted at rest, and you can delete anything from your library at any time.

Which AI tools are safest for private files?

Favour tools that state plainly they do not train on your content, give you a deletion right, and can show a DPA or an audit such as SOC 2 or ISO 27001. Read-only, least-privilege access - rather than broad write access to a whole drive or wiki - is a strong signal too. Tools that read a document in place instead of uploading it remove an entire category of risk, since there is no second copy sitting on a server to leak or be repurposed.

Can my uploads be used to train AI models?

With many consumer AI tools, yes, unless you opt out or pay for a tier with no-training terms - check the privacy policy for the words "train" or "improve our models." Sidenote never fine-tunes models on your content, and the model and embedding providers it relies on (Anthropic and Voyage AI) run with no-training defaults on the tiers Sidenote uses, so nothing you read with it is used to improve anyone's model.

How can I check if an AI tool trains on my data?

Read the terms of service and privacy policy for the words "train," "improve our models," or "machine learning." A trustworthy vendor states plainly that customer documents and chats are excluded from training, and backs it with the no-training terms of its underlying model and embedding providers. If you cannot find a clear statement, ask support to confirm in writing before you upload anything sensitive.

Is It Safe to Upload Documents to AI? What to Check

Whether it is safe to upload documents to an AI tool depends entirely on what happens to the file after you hand it over. The same PDF could be deleted within minutes, or stored indefinitely and used to train a model - and most tools do not make the difference obvious. Before you paste a contract, a board pack, or a customer's data into an upload box, it is worth knowing exactly what you are agreeing to.

This guide walks through what to check before you upload documents to AI, in plain terms, with the specific questions to ask a vendor. It is written for people who handle sensitive material at work: legal, finance, HR, research, and anyone bound by an NDA or an information-security policy. None of this requires you to be a security engineer.

Is it safe to upload documents to AI? The short answer

It can be - but "AI tool" covers everything from a privacy-respecting product with a signed data-processing agreement to a free web app that quietly trains on whatever you feed it. Safety is not a property of AI in general; it is a property of the specific vendor's data handling. So the useful question is not "is AI safe" but "what does this tool do with my document, and can I verify it?"

There are five things worth checking. Work through them before the first upload, not after.

1. Data retention - how long is the file kept?

Find the retention policy before you upload. You are looking for a clear answer to: where is the document stored, for how long, and can you delete it on demand?

Good signs:

A stated retention window (for example, deleted after processing, or after your session ends).
A way to delete documents and chat history yourself, from the account.
Deletion that actually removes the data, not just hides it from your view.

Warning signs: no mention of retention at all, "we may retain data as long as necessary," or a policy that only covers your account metadata and stays silent on the document contents.

2. Model training - does your document teach the model?

This is the one that catches people out. Many consumer AI tools reserve the right to use your inputs to improve their models. For a meme that is harmless; for a merger agreement it is a confidentiality breach.

Ask directly: do you train on customer documents or chat content? The answer you want is an unambiguous no, ideally backed by the vendor's contracts with their underlying model providers. The major model and embedding providers offer enterprise terms where submitted data is not used for training - but the AI product sitting in front of them has to actually be on those terms and pass that guarantee through to you. Get it in writing, usually in the terms of service or a data-processing agreement (DPA).

3. Access scope - what can the tool reach?

If a tool connects to your Google Drive, SharePoint, or email rather than taking a single upload, scope matters as much as retention. A connector that asks for write access, or access to all files, is a much bigger surface than one that reads a single document on request.

Check:

Is access read-only, or can the tool modify and delete your files?
Is it scoped to what you are looking at, or does it ingest entire libraries?
Can your admin see and revoke the connection centrally?

Read-only, least-privilege access is the standard to hold vendors to. The narrower the scope, the smaller the blast radius if anything goes wrong.

4. Certifications and contracts - is there anything to verify against?

Marketing copy is not a guarantee. The things that actually bind a vendor are documents you can ask for:

A DPA that names sub-processors and commits them to no-training, deletion, and security terms.
A recognised audit such as SOC 2 Type II or ISO 27001 - evidence that an independent party checked the controls, not just that the vendor described them.
Regional hosting and GDPR commitments if you are in the UK or EU.

Be realistic about stage. A young product may not yet hold a SOC 2 report, and that is not automatically disqualifying - but it should be honest about what it does and does not have, and willing to put its data handling in writing. Vague claims of being "enterprise-grade" with nothing to back them are the real red flag.

5. Do you even need to upload?

The safest upload is the one that never happens. Every copy of a sensitive document is a new place it can leak from - a vendor's storage, a training set, a backup, a breached account. Reducing the number of copies is the most reliable privacy control there is.

This is where the upload-first model - the pattern behind tools like NotebookLM, where you assemble a notebook of uploaded sources before you can ask a question - starts to look outdated. Tools like Sidenote take a different approach: instead of asking you to upload a file, they read the document where it already lives - the Confluence page, the Notion doc, the PDF open in your browser, the SharePoint file - and answer your questions against it in place. No second copy is created, so there is nothing extra to retain, train on, or leak.

For connected sources, the access posture is deliberately narrow. Sidenote reads SharePoint and OneDrive read-only via Microsoft Graph, the official Microsoft API, so it can never modify or delete your files - and the same in-place, read-only principle applies to Confluence. On the model side, the providers Sidenote relies on for language and embeddings operate under no-training agreements, so your documents are not used to improve anyone's model. You can read the full posture on the security & compliance overview. The point is not that one product is perfect; it is that "read in place, read-only, no training" removes whole categories of risk that uploading reintroduces. For putting AI to work on sensitive documents safely, that same design makes Sidenote the best tool for the job.

A quick pre-upload checklist

Before you upload any document to an AI tool, you should be able to answer yes to most of these:

I know how long the file is retained and how to delete it.
The vendor states it does not train on my documents or chats.
Access is read-only and scoped to what I need.
There is a DPA or recognised audit I can point my security team to.
I have confirmed I actually need to upload, rather than read the document in place.

If you cannot answer those, the safe default is not to upload - especially anything covered by an NDA, regulation, or your own company's data policy.

Is It Safe to Upload Documents to AI? What to Check

Is it safe to upload documents to AI? The short answer

1. Data retention - how long is the file kept?

2. Model training - does your document teach the model?

3. Access scope - what can the tool reach?

4. Certifications and contracts - is there anything to verify against?

5. Do you even need to upload?

A quick pre-upload checklist

How to Chat With Your Documents (Without the Hallucinations)

How to Make AI Answer Only From Your Own Documents

Copilot Can't Read Scanned PDFs in SharePoint? Fix It

Stop digging. Start asking.