File types & limitations

Vault formats: PDF, plain text, Word (.docx), Markdown (.md), CSV, and PowerPoint (.pptx) — plus size limits, OCR, and why some PDFs look fine on screen but carry no extractable text for the assistant.

Supported formats

The app accepts these file types for your local vault:

  • PDF — files whose name ends in .pdf
  • Plain text.txt
  • Word (modern format).docx (Office Open XML). Older .doc files are not supported—save as .docx first.
  • Markdown.md (read as plain text)
  • CSV.csv (rows are turned into readable sentences for search)
  • PowerPoint.pptx (text from slides is extracted in order)

Other types are rejected with a clear error. The file picker may list additional extensions for convenience; only the types above are indexed into the vault.

Size limits

Each file must be 100 MB or smaller. The same limit applies whether you use the file picker, drag-and-drop, or add files from chat — the app checks size before it starts processing.

Page count and hardware

There is no fixed maximum page count. Very large PDFs simply take longer and use more memory on your computer while the app prepares them. If processing feels stuck or extremely slow, try splitting the document or using a machine with more RAM and a fast SSD.

Practical limits follow your RAM, disk speed, and how busy the computer is—you will not find a separate “page limit” setting in the app.

The “invisible text” problem (scanned PDFs)

For PDFs, the app reads text that is already stored inside the file. It does not “read” a photograph of a page the way a human eye does — unless you enable Advanced Parsing (see below), which runs full OCR directly on your device.

When Advanced Parsing is turned on, the app performs both a richer layout pass for tables and structure and optical character recognition (OCR) on image-only pages. Scanned PDFs and photocopied documents can be fully indexed without any external tool — everything runs locally, never leaving your machine. In the default (fast) mode, only PDFs that already contain a text layer are searchable.

Quick check: open the PDF in any normal viewer. If you cannot click and drag to highlight words and copy them, re-upload the file with Advanced Parsing enabled — OCR will run automatically, entirely on your device.

DOCX is still text-based

.docx files use the typed text in the document. Pictures of text inside Word are not automatically read as text.

Advanced parsing: tables, layout, and OCR (PDF only)

On the Documents screen you can turn on Use Advanced Table Parsing (Slower Upload – Recommended for complex tables). This mode uses a deeper analysis engine that understands complex layouts, multi-column tables, and — critically — runs OCR so that scanned or image-only pages are fully extracted. When this option is off, uploads use the faster path that works well for PDFs that already contain selectable text.

Use Advanced Parsing whenever your PDF is scanned, photocopied, or contains dense tables. It is slower but produces significantly better results for those document types.