Partitions

In Ragie documents may be logically separated into partitions. Retrievals can provide an optional partition parameter, that, when present, scopes the retrieval to documents in the given partition. If omitted retrievals are scoped to an implicit "default" partition. Partitions can be used for a number of use cases, such as segregating user data in multi-tenant saas applications or defining distinct knowledge bases for use in different contexts. Partitions are an optional feature and, if not used, all documents will exist in a “default” partition for your tenant.

Partitions are provided as a string that is lowercase alphanumeric and may include the _ and - special characters.

Partitions may also be used to improve retrieval results. Ragie uses a hybrid approach when performing retrievals that includes searching a keyword index. These keywords indexes are also separated by partitions. A relevant detail here is that keyword importance (or weight) is partially determined by how frequently the keyword appears in the set of documents (inverse document frequency). For example legal jargon in a set of law documents would be relatively less important compared to legal jargon appearing in customer service documents, since presumably the legal terms would appear far more frequently in the law documents. Partitions are a useful construct for separating documents by domain to improve the quality of the keyword portion of Ragie’s hybrid search.

Working with Partitions

Creating partitions

A Partition is automatically created anytime a document, connection or instruction is created in it.

Documents and partitions

Documents can be created in a partition by providing an optional partition string when creating them. If a partition is omitted the document will be created in the "default" partition.

Retrievals

The optional partition parameter can be provided when doing retrievals. When present the retrieval will be scoped to that Partition. Fine grained scoping via metadata filters may be combined with partition scoping.

Connections and partitions

Connections can be created in a partition by providing an optional partition string when creating them. If a partition is omitted, documents managed by the partition will be in the "default" partition.

Entity Extraction Instructions and partitions

Instructions can be created in a partition by providing an optional partition string when creating them. If a partition is provided, the instruction will attempt to extract entities from documents in all partitions. If a partition is provided, the instruction will only execute on documents in that partition.

Partitions handling for other endpoints

Many other partitions support scoping the request to a partition using the partition http header or a parameter with that same name in the Ragie SDKs. If partition is omitted the requests will be scoped to the "default" partition. One caveat to this behavior is accounts created prior to 1/9/2025, which will have the requests scoped to all partitions. Those accounts may opt in to stricter partition enforcement by contacting [email protected]. If you're using partitions, it's strongly encouraged to explicitly set partition on requests that support it.

Endpoints that support partition scoping

Managing partitions

Explicitly creating partitions

While partitions can be created implicitly through various operations like creating a document, they can also be explicitly created using Create Partition This can be useful if creating a partition with limits defined is required before documents are created in it.

Getting usage information on partitions

Get Partition may be used to fetch usage information about a partition. This information includes: pages_processed_monthly, pages_hosted_monthly, pages_processed_total, pages_hosted_total, and document_count. These usage statistics align with how Ragie bills for usage and can provide a means to forward usage costs to end customers. There are definable limits associated with most of these statistics that will prevent a partition from accepting new documents when exceeded. The limit_exceeded_at will be populated with an ISO 8601 Date Time string if a partition is limited.

Setting limits on partitions

Limits may be set on a partition using Set Partition Limits. Doing so will immediately check the partition's current usage and limit the partition if any limits are exceeded. Going forward the limits will be checked after documents in the partition are processed or deleted. Monthly limits are re-evaluated at roughly midnight UTC on the first of the month. Limits may be set for pages_processed_limit_monthly, pages_hosted_limit_monthly, pages_processed_limit_max, and pages_hosted_limit_max When limits are exceeded a webhook event with the type partition_limit_exceeded` will be emitted, that includes the partition's name and the type of limit which was exceeded.

Filtering Webhook Endpoint calls

Webhook Endpoints may be configured to only receive calls related to specific partitions. This can be helpful when multiple apps interact with Ragie or when handling multiple environments. Learn more in the webhooks documentation.

Deleting partitions

Partitions may be deleted using Delete Partition. Deleting a partition will delete all data associated with the partition including documents, connections, instructions, and entities associated with the partition. This operation is irreversible and should be used with care.

Common use cases

Multi-tenant SaaS

Multi-tenant applications will generally want to isolate their users’ data to prevent data leakage between users. Apps may want to use a USER_ID as their partition key or potentially an ORG_ID if the app is more multiuser in nature. Isolating user and organization data is as simple as providing the desired partition key when managing documents and doing retrievals. More fine-grained retrieval scoping via metadata filters is still possible and can be combined with partitions.

Isolated Knowledge Bases

If an organization has multiple distinct domains of knowledge that they want to use as sources for their generative AI applications, creating partitions for those domains will improve the quality of the keyword component of Ragie’s hybrid search approach. Depending on the use case, creating a distinct partition for various functions such as customer support, legal, HR, etc… may be the ideal approach. This is not always a one size fits all recommendation and may affect how you structure your retrievals. If you have any questions, we’re always happy to discuss the particulars of your use case and help you design the best approach.