Sync Filter
A Sync Filter allows you to specify what files you want to skip ingesting.
For example, if you have a Google Drive folder with lots of text files that you want to ingest and the occasional image or pdf you don't, you can use a Sync Filter to filter out the images and pdfs.
A Sync Filter acts on document metadata to decide if a file should be ingested. You can add a glob or list of globs to each metadata key. If any glob matches, that file will be skipped.
If any glob matches for any metadata key, that file will be skipped.
Examples
Suppose we have the following document metadata for a Google Drive file. See the list of metadata you can filter on here
{
"source_type": "google_drive",
"connector_id": "b12237e3-2cdc-4c8b-a06f-f0e8f26ff1d1",
"created_at": "2026-04-03T22:43:58.641000+00:00",
"source_name": "invoice.pdf",
"folder": "files",
"file_path": "/My Drive/files/invoice.pdf",
"file_path_array": [
"My Drive",
"files",
"invoice.pdf"
],
"folder_path": "/My Drive/files",
"created_by_email": "[email protected]",
"created_by_name": "Andrey"
}
Filter out PDFs
Let's filter out any pdfs, then our sync filter can be
{
"file_path": "*.pdf"
}
This will match anything that ends in pdf like
- invoice.pdf
- sample.pdf
It will NOT match
- dog.png
- data.docx
-
Filter out certain names
What if you want to only reject invoice like file names, but include other pdfs?
You can do
{
"file_path": "*invoice.pdf"
}
This matches
- invoice.pdf
- 2025_invoice.pdf
It will NOT match
- ivoice.pdf - this has the wrong spelling
- invoice.doc
Filter out multiple extensions
What if you want to filter out many file extensions like all pngs, jpegs, jpgs, and pdfs?
You can do
{
"file_path": ["*.png", "*.jpg", "*.jpeg", "*.pdf"]
}
This matches
- dog.png
- invoice.pdf
- cat.jpeg
- cat.jpg
It does NOT match
- report.doc
- report_2026.txt
- report.md
Skipping a folder
Let's skip the folder "old_documents" in our Google drive
Suppose we have this folder structure
My Drive
- documents
- old_documents
- doc_1.pdf
- invoice.pdf
- etc
- old_documents
You can do
{
"file_path": "*/old_documents/*
}
This matches
- My Drive/old_documents/doc_1.pdf
It does NOT match
- My Drive/invoice.pdf
- My Drive/invoices/invoice2026.pdf
Multiple Keys
Suppose you want to filter out any files that are in a "old_documents" folder or were created by [email protected]
You can do
{
"file_path": "*/old_documents/*",
"created_by_email": "[email protected]"
}
As long as either metadata key matches, the file will be skipped.
Updated 1 minute ago