Partitions
In Ragie documents may be logically separated into partitions. Retrievals can provide an optional partition
parameter, that, when present, scopes the retrieval to documents in the given partition. If omitted retrievals are scoped to an implicit "default" partition. Partitions can be used for a number of use cases, such as segregating user data in multi-tenant saas applications or defining distinct knowledge bases for use in different contexts. Partitions are an optional feature and, if not used, all documents will exist in a “default” partition for your tenant.
Partitions are provided as a string that is lowercase alphanumeric and may include the _
and -
special characters.
Partitions may also be used to improve retrieval results. Ragie uses a hybrid approach when performing retrievals that includes searching a keyword index. These keywords indexes are also separated by partitions. A relevant detail here is that keyword importance (or weight) is partially determined by how frequently the keyword appears in the set of documents (inverse document frequency). For example legal jargon in a set of law documents would be relatively less important compared to legal jargon appearing in customer service documents, since presumably the legal terms would appear far more frequently in the law documents. Partitions are a useful construct for separating documents by domain to improve the quality of the keyword portion of Ragie’s hybrid search.
Working with Partitions
Creating partitions
A Partition is automatically created anytime a document, connection or instruction is created in it.
Documents and partitions
Documents can be created in a partition by providing an optional partition
string when creating them. If a partition is omitted the document will be created in the "default" partition.
Retrievals
The optional partition
parameter can be provided when doing retrievals. When present the retrieval will be scoped to that Partition. Fine grained scoping via metadata filters may be combined with partition scoping.
Connections and partitions
Connections can be created in a partition by providing an optional partition
string when creating them. If a partition is omitted, documents managed by the partition will be in the "default" partition.
Entity Extraction Instructions and partitions
Instructions can be created in a partition by providing an optional partition
string when creating them. If a partition is provided, the instruction will attempt to extract entities from documents in all partitions. If a partition is provided, the instruction will only execute on documents in that partition.
Partitions handling for other endpoints
Many other partitions support scoping the request to a partition using the partition
http header or a parameter with that same name in the Ragie SDKs. If partition is omitted the requests will be scoped to the "default" partition. One caveat to this behavior is accounts created prior to 1/9/2025, which will have the requests scoped to all partitions. Those accounts may opt in to stricter partition enforcement by contacting [email protected]. If you're using partitions, it's strongly encouraged to explicitly set partition
on requests that support it.
Endpoints that support partition scoping
- Get Document
- Delete Document
- Update Document File
- Update Document Raw
- Patch Document Metadata
- Get Document Chunks
- Get Document Chunk
- Get Document Summary
- Get Instruction Extracted Entities
- Get Document Extracted Entities
Managing partitions
Explicitly creating partitions
While partitions can be created implicitly through various operations like creating a document, they can also be explicitly created using Create Partition This can be useful if creating a partition with limits defined is required before documents are created in it.
Getting usage information on partitions
Get Partition may be used to fetch usage information about a partition. This information includes: pages_processed_monthly
, pages_hosted_monthly
, pages_processed_total
, pages_hosted_total
, and document_count
. These usage statistics align with how Ragie bills for usage and can provide a means to forward usage costs to end customers. There are definable limits associated with most of these statistics that will prevent a partition from accepting new documents when exceeded. The limit_exceeded_at
will be populated with an ISO 8601 Date Time string if a partition is limited.
Setting limits on partitions
Limits may be set on a partition using Set Partition Limits. Doing so will immediately check the partition's current usage and limit the partition if any limits are exceeded. Going forward the limits will be checked after documents in the partition are processed or deleted. Monthly limits are re-evaluated at roughly midnight UTC on the first of the month. Limits may be set for pages_processed_limit_monthly
, pages_hosted_limit_monthly
, pages_processed_limit_max
, and pages_hosted_limit_max When limits are exceeded a webhook event with the type
partition_limit_exceeded` will be emitted, that includes the partition's name and the type of limit which was exceeded.
Deleting partitions
Partitions may be deleted using Delete Partition. Deleting a partition will delete all data associated with the partition including documents, connections, instructions, and entities associated with the partition. This operation is irreversible and should be used with care.
Common use cases
Multi-tenant SaaS
Multi-tenant applications will generally want to isolate their users’ data to prevent data leakage between users. Apps may want to use a USER_ID as their partition key or potentially an ORG_ID if the app is more multiuser in nature. Isolating user and organization data is as simple as providing the desired partition key when managing documents and doing retrievals. More fine-grained retrieval scoping via metadata filters is still possible and can be combined with partitions.
Isolated Knowledge Bases
If an organization has multiple distinct domains of knowledge that they want to use as sources for their generative AI applications, creating partitions for those domains will improve the quality of the keyword component of Ragie’s hybrid search approach. Depending on the use case, creating a distinct partition for various functions such as customer support, legal, HR, etc… may be the ideal approach. This is not always a one size fits all recommendation and may affect how you structure your retrievals. If you have any questions, we’re always happy to discuss the particulars of your use case and help you design the best approach.
Updated 13 days ago